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1  Summary 


This  grant  supported  research  in  the  area  of  robust  mobile  multimedia  communications  with  two 
focus  areas: 

•  Time- varying  convolutional  codes  and  their  application  to  turbo  codes. 

•  Joint-source  channel  coding  for  multimedia  communications. 

The  work  performed  under  this  grant  led  to  significant  contributions  in  both  areas  and  resulted 
in  three  (3)  conference  papers  and  three  (3)  journal  papers.  A  detailed  discussion  of  the  technical 
results  forms  the  majority  of  this  report  and  begins  in  Section  2. 

The  grant  and  matching  funds  provided  partial  support  for  three  (3)  faculty  members  and  six  (6) 
graduate  students,  including  two  U.S.  citizens.  The  work  performed  by  these  students  has  resulted 
in  three  (3)  Master’s  degrees  and  will  result  in  an  additional  two  (2)  Master’s  degree  and  one  (1) 
Doctoral  degree.  In  addition,  worked  performed  under  this  grant  is  included  in  the  book  Trellis 
and  Turbo  Coding  by  Christian  B.  Schlegel,  of  the  University  of  Alberta,  and  Lance  C.  Perez,  of 
the  University  of  Nebraska,  Lincoln,  to  be  published  by  the  IEEE  Press/John  Wiley  and  Sons  in 
the  Fall  of  2003. 

1.1  Personnel  Supported 

Principle  Investigator:  Lance  C.  Perez,  Ph.D. 

Co-Principle  Investigator:  Michael  W.  Hoffman,  Ph.D. 

Co-Principle  Investigator:  Khalid  Sayood,  Ph.D. 

Research  Assistant:  Qian  Hu,  M.S.E.E.,  University  of  Nebraska,  Lincoln,  December  2000. 

Research  Assistant:  Gulay  Ozkan,  M.S.E.E.,  University  of  Nebraska,  Lincoln,  December  2000. 

Research  Assistant:  Devrim  Ayilldiz,  M.S.E.E., University  of  Nebraska,  Lincoln,  May  2001. 

Research  Assistant:  Christopher  G.  Hruby,  M.S.E.E.  University  of  Nebraska,  Lincoln,  expected 
August  2003. 

Research  Assistant:  Matthew  Becker,  M.S.E.E.  University  of  Nebraska,  Lincoln,  expected  Au¬ 
gust  2003. 

Research  Assistant:  Fan  Jiang,  Ph.D.  University  of  Nebraska,  Lincoln,  expected  August  2004 
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1.2  Technical  Publications 


Journal  Publications 


1.  Q.  Bao,  M.  W.  Hoffman,  and  K.  Sayood,  “Multiple  description  imaged  coding  by  vector  quan¬ 
tization”,  submitted  to  the  IEEE  Transactions  on  Circuits  and  Systems  for  Video  Technology. 

2.  Q.  Hu  and  L.  C.  Perez,  “Time-varying  convolutional  codes  that  meet  the  Heller  bound”,  in 
preparation  for  the  IEEE  Transactions  on  Information  Theory. 

3.  F.  Jiang,  M.  Becker,  and  L.  C.  Perez,  “Time-varying  turbo  codes”,  in  preparation  for  IEEE 
Communications  Letters. 


Reviewed  Conference  Proceedings 


1.  F.  Jiang,  M.  Becker  and  L.  C.  Perez,  “Turbo  Codes  with  Time- Varying  Component  Codes”, 
2003  Conference  on  Information  Sciences  and  Systems,  The  Johns  Hopkins  University,  Balti¬ 
more,  MD,  March  13,  2003. 

2.  Q.  Hu  and  L.  C.  Perez,  “Some  Periodic  Time- Varying  Convolutional  Codes  with  Free  Dis¬ 
tance  Achieving  the  Heller  Bound,’  2001  International  Symposium  on  Information  Theory, 
Washington  D.C.,  June  28,  2001. 

3.  Q.  Hu  and  L.  C.  Perez,  “Time- Varying  Convolutional  Codes  with  Large  Distance,”  2001 
Conference  on  Information  Sciences  and  Systems,  The  Johns  Hopkins  University,  Baltimore, 
MD,  March  21,  2001. 


Masters  Theses 


1.  Q.  Bao,  Multiple  Description  Image  Coding  Over  Noise  Channels  by  Vector  Quantization, 
May,  2000. 

2.  G.  Ozkan,  Design  of  Variable  Length  Codes  for  Noisy  Channels,  December  2000. 

3.  Q.  Hu,  Design  and  Analysis  of  Time-  Varying  Convolutional  Codes,  December  2000. 

4.  M.  Becker,  Practical  Issues  of  Turbo  Coding  and  Iterative  Decoding,  expected  August  2003. 

5.  C.  Hruby  Properties  of  Low  Density  Parity  Check  Codes ,  expected  August  2003. 

Doctoral  Dissertations 

1.  F.  Jiang,  Structural  Properties  and  Applications  of  Time-varying  Convolutional  Codes ,  ex¬ 
pected  August  2004. 
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2  Technical  Results 


In  the  remainder  of  this  report,  we  detail  the  technical  contributions  made  under  the  support  of 
this  grant.  The  technical  contributions  are  in  the  two  focus  areas 

•  Time- varying  convolutional  codes  and  their  application  to  turbo  codes. 

•  Joint-source  channel  coding  for  multimedia  applications. 


Section  3  discusses  results  in  the  area  of  time- varying  convolutional  codes.  This  includes  the 
discovery  of  a  new  time-varying  convolutional  code  that  achieves  the  Heller  bound  and  has  free 
distance  greater  than  the  best  time-variant  convolutional  code.  Section  4  discusses  the  application 
of  time- varying  convolutional  codes  as  component  codes  in  turbo  codes.  The  work  done  in  this  area 
resulted  in  a  new  time- varying  component  code  that  outperforms  the  celebrated  big  numerator,  little 
denominator  (BNLD)  codes  of  Massey,  Takeshita  and  Costello  [14].  Finally,  section  5  contains  a 
thorough  simulation  comparison  of  the  state  of  the  art  of  joint-source  channel  coding  schemes  for 
image  transmission. 


3  Time- Varying  Convolutional  Codes 


3.1  Periodic  Time  Varying  Convolutional  Codes 


In  general,  a  periodic  time  varying  convolutional  code  is  denoted  as  a  (n,  k,  m,  P)  code,  where  n,  k, 
and  m  are  defined  in  the  same  way  as  for  time  invariant  codes  and  P  is  the  period  of  the  code.  The 
ordered  set  {Ct,i  =  1,  2,...,  P }  represents  the  time  invariant  codes  that  make  up  the  periodic 
time  varying  code. 


Figure  1:  Encoder  for  a  (2, 1,2, 2)  periodic  time  varying  convolutional  code. 
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Figure  2:  Trellis  diagram  for  a  (2, 1,2, 2)  periodic  time  varying  convolutional  code. 


Figure  1  shows  a  (2, 1,2, 2)  periodic  time  varying  code,  where  Ei  is  the  encoder  of  a  (2,1,2)  time 
invariant  code,  C\t  with  =  (1  0  1)  and  =  (11  1),  and  E2  is  the  encoder  of  a  code,  C2, 
with  gw  =  (10  0)  and  g^  =  (11  1).  The  input,  u,  enters  both  £1  and  £2,  while  the  output  v 
switches  between  the  outputs  of  E\  and  outputs  of  £2.  The  state  of  Ei  and  the  state  of  E2  are  the 
same  at  all  time.  This  can  also  be  understood  as  £1  and  E2  share  the  same  memory.  Therefore, 
E\  and  E2  have  the  same  transitions  in  the  state  diagram  but  with  different  branch  labels. 

The  trellis  of  the  periodic  time  varying  code  is  obtained  by  expanding  the  state  diagram  in  time 
and  labeling  branches  alternatively  with  the  branch  label  of  E\  and  E2.  Figure  2  shows  the  trellis 
of  the  encoder  in  Figure  1.  Let  the  input  sequence  up,  6)  =  (1  0  1  0  0).  The  encoding  of  this 
sequence  by  the  periodic  time  varying  code  is  shown  in  Table  1  and  its  trellis  is  the  blue  path 
shown  in  Figure  2.  At  time  unit  0,  “1”  goes  into  E\  and  E2,  the  state  changes  from  S0  to  S2  and 
output  becomes  (11)  since  E\  is  on  duty.  Then  at  time  unit  1,  **0”  enters  and  the  state  transits  to 
S\.  The  output  switches  to  E2  and  (0  1)  is  the  output  of  the  periodic  time  varying  code  at  tune 
unit  1.  The  encoding  result  is  shown  in  Table  1.  This  can  also  be  obtained  by  following  the  blue 
path  in  Figure  2. 


Time 

input  u 

E\  output 

E2  output 

0 

1 

11 

1 

10 

01 

2 

101 

00 

3 

1010 

01 

4 

10100 

11 

Table  1:  Encoding  of  the  code  in  Figure  1  with  input  sequence  up,  6)  =  (1  0  1  0  0). 

From  Chapter  2,  u(i)  and  v(i]i)  denote  the  input  and  output  sequences,  respectively,  over  the 
time  instants  i,  i  +  1,  •  •  • ,  j  —  1  and  Uj  is  the  fc-tuple  of  input  bits  at  time  instant  t.  The  free 
distance  for  a  linear  time  invariant  convolutional  code  is  the  smallest  Hamming  distance  between 
output  sequences  V[0iOo)  resulting  from  distinct  input  sequences  up,  00)  •  It  is  equivalent  to  the 
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smallest  w#  (vpt  <*,))  achievable  with  U[0i  <»)  i2  0,  where  w //(•)  denotes  Hamming  weight.  For 
periodic  time  varying  codes,  dfree  is  equal  to  the  smallest  w(vpi00))  achievable  with  upt  p)  j2  0. 
For  time  invariant  codes,  up  oo)  may  be  considered  to  be  all  information  sequences  with  uo  ^  0. 
This  corresponds  to  all  the  paths  in  the  trellis  diagram  that  leave  S0  at  time  0. 

The  condition  for  periodic  time  varying  codes,  up,  p)  i2  0,  states  that  the  information  sequence 
can  not  be  all  zero  from  time  0  to  P  -  1.  These  information  sequences  correspond  to  all  the  paths 
leaving  Sq  at  time  0,  1,  . . .,  or  P  -  1.  For  example,  the  free  distance  ofptjie  code  in  Figure  1  is  4. 
The  path  which  has  weight  4  is  shown  as  the  red  path  in  Figure  2.  This  path  corresponds  to  the 
information  sequence  up,  4)  =  (0  1  0  0) .  It  is  the  only  path  of  weight  4.  This  path  leaves  Sq  at  time 
1.  Note  that  the  first  two  information  bits  are  (0  1).  This  satisfies  the  condition  that  up,  p)  i2  0 
with  P  =  2. 

The  column  distance  of  a  periodic  time  varying  code  is  defined  in  almost  the  same  way  as  it  is 
in  the  time  invariant  case,  except  that  instead  of  only  one  column  distance  dp^  at  each  time  unit 
i,  there  are  P  column  distances  dpci(j),  j  =  1,  2,  ...,  P,  corresponding  to  the  initial  code  Cj, 
j  —  1,  2,  . . . ,  P  at  time  unit  i.  That  is, 

dPci(j)  =  min{d(vj0  j^vp  ^)  :  u'o^Uq  Cj  initial} 

=  min{w(vp) ;))  :  uo  7^  0  Cj  initial} 

where  i  =  0,  1,  ...  and  j  =  1,  — ,  P.  It  is  easy  to  find  that  dp, 2(1)  =  ^  d£,2(2)  =  3  for  the 

code  in  Figure  1  through  its  trellis  in  Figure  2. 

T.iVp  the  column  distance,  there  are  P  distance  profiles  for  a  periodic  time  varying  code.  They  are 
defined  as 

d"(i)  =  Ko(j),  j  =  l  .2 . P 

where  j  corresponds  to  the  initial  code  Cj.  For  instance,  dp(0)  and  dp(l)  for  the  periodic  time 
varying  code  in  Figure  1  are 

dp(0)  =  [2,  3,  3] 

and 

dp(l)  =  [2,  3,  3], 

It  might  happen  that  the  time  invariant  MFD  convolutional  code  and  the  periodic  time  varying 
convolutional  code  for  a  given  rate  and  memory  have  the  same  free  distance.  For  example,  the 
(2, 1,2)  time  invariant  code  with  g  =  [7,2]  has  dfree  =  4  and  the  periodic  time  varying  code  with 
the  encoder  shown  in  Figure  1  also  has  dfree  =  4.  Thus,  to  resolve  ties  in  dfree ,  two  other  criteria 
are  introduced.  The  first  criterion  is  Ndfr„,  the  average  number  of  paths  of  weight  dfree  per  time 
instant.  The  second  criterion  is  Id!rrri  the  average  number  of  information  bits  per  time  instant 
along  paths  with  weight  dfree.  If  two  codes  have  the  same  djree,  the  one  with  the  smaller  value  of 
Ndfr and  then  Idfrcc  is  said  to  be  better.  Mathematically,  Ndfrer  and  Idfree  are  defined  as 

Ndfret  =  p#(u[0,  oo)  :  Up,  p)  7 k  0,Wp(vpt  oo))  =  dfree} 

and  j 

Idfree  —  53  wtf(up,  00) )) 

■^up,  TO)es 
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where  #{•}  denotes  cardinality  and  s  is  the  set  of  input  sequences  which  cause  the  output  sequences 
with  Hamming  weight  df  ree* 

The  Ndfree  and  Idfree  of  the  code  in  Figure  1  are  1/2  and  1/2.  There  is  only  one  path  of  weight  4 
for  this  code.  Therefore,  Ndfree  =  1/2.  The  corresponding  input  is  u[0, 4)  =  (0  1  0  0).  The  weight 
of  this  information  sequence  is  1.  Since  there  is  only  one  path,  the  sum  of  the  weight  is  1  and 

W„  =  1/2. 


3.2  Previous  Results 


Motivated  by  Costello’s  conjecture,  early  work  on  periodic  time  varying  codes  focus  on  searching  for 
periodic  time  varying  codes  with  larger  d yree  than  the  comparable  time  invariant  codes.  Mooser[l] 
found  that  some  periodically  time  varying  convolutional  codes  have  the  same  dfree  as  the  time 
invariant  MFD  code,  but  have  both  a  smaller  Ndfree  and  Idfree.  The  codes  he  found  are  (2, 1, 4,  P) 
periodic  convolutional  codes  with  P  —  1,  2  and  3  and  are  listed  in  Table  2.  Note  that  the  time 
invariant  code,  [23, 35]  is  the  MFD  time  invariant  code.  Simulation  results  for  these  codes  are  shown 
in  Figure  3.  Mooser  concentrated  on  memory  4,  rate  1/2  codes,  because  it  is  the  smallest  memory 
M  such  that  Heller’s  upper  bound  on  dfree  =  8  is  not  achieved  by  any  time  invariant  convolutional 
code.  However,  Mooser  did  not  find  a  periodic  time  varying  code  with  memory  4  and  rate  1/2  that 
has  dfree  equal  to  8. 


p 

r 

dfree 

Ndfree 

Idfree 

Generator  Sequences  (Octal) 

1 

1/2 

7 

2 

4 

[23,35]  | 

2 

1/2 

7 

3/2 

3 

[23,35]  [25,37] 

3 

1/2 

7 

4/3 

8/3 

[23,35]  [25,37]  [27,35] 

Table  2:  The  (2, 1,4)  time  invariant  MFD  code  and  the  (2, 1,4,P)  periodic  time  varying  convolu¬ 
tional  codes  found  by  Mooser  with  P  =  2  and  3. 

Palazzo[2]  found  some  (4, 1,2,P)  periodic  time  varying  convolutional  codes  with  P  =  1,  2,  3,  4 
and  5,  which  have  the  same  dfree  as  the  time  invariant  MFD  codes,  but  smaller  Ndfree  and  Idfrte- 
These  codes  are  listed  in  Table  3.  Simulation  results  for  these  codes  are  shown  in  Figure  4.  Note 
that  the  [573]  code  is  the  MFD  time  invariant  code.  Here,  573  means  the  generator  sequences, 

€%’,  sS”  =  5.  So1’  =  2,  sf  =  1  S t  =  7. 


P 

r 

dfree 

Ndfree 

Idfree 

Generator  Sequences  (Octal) 

1 

1/4 

10 

1 

2 

573 

2 

1/4 

10 

1/2 

1/2 

~5272~573  " 

3 

1/4 

10 

2/3 

2/3 

5 5*7*  573 

4 

1/4 

10 

3/4 

3/4 

5272  5^72  52 7a  573 

5 

1/4 

10 

4/5 

4/5 

5272  5272  5272  5272  574 

Table  3:  The  (4, 1, 2)  time  invariant  MFD  code  and  (4, 1, 2,  P)  periodic  time  varying  convolutional 
codes  found  by  Palazzo  with  P  —  2,  3,  4,  and  5. 

Palazzo[2]  also  found  a  (3,2, 1,2)  periodic  time  varying  code  with  dfree  greater  than  the  time 
invariant  MFD  code.  This  code  is  shown  in  Table  4,  where  [6,4,0  ;  2,6,6]  means  are: 
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Figure  3:  Simulation  results  of  the  time  invariant  MFD  code  and  the  (2, 1,4,  P)  codes  found  by 
Mooser. 

5J0)  =  (1  1  0),  gjp  =(10  0),  ^2)  =  (1  1  0),  =  (0  1  0),  g?  =  (110)  and  gf]  =  (1  1  0). 

The  (3, 2, 1)  MFD  time  invariant  code  is  listed  in  Table  4,  too.  The  simulation  results  of  these 
codes  are  shown  in  Figure  5. 


p 

r 

dfree 

Ndfree 

Id  free 

Generator  Sequences  (Octal) 

1 

2/3 

3 

2 

4 

[6,2,6;  2,4,4] 

2 

2/3 

4 

23/2 

66/2 

[6,4,6;  2,6,0]  [6,4,0;  2,6,6] 

Table  4:  The  (3,2, 1)  time  invariant  MFD  code  and  the  (3, 2, 1,2)  periodic  time  varying  convolu¬ 
tional  code  found  by  Palazzo. 

By  randomly  searching  for  period  P  =  4  time  varying  convolutional  codes,  Lee[3]  found  a  code 
with  memory  4  and  rate  1/2  which  achieves  Heller’s  upper  bound.  It  is  the  first  convolutional  code 
with  memory  4  that  achieves  the  Heller  bound.  This  code  together  with  the  (2, 1, 4)  time  invariant 
MFD  code  are  listed  in  Table  5.  The  simulation  results  for  these  codes  are  shown  in  Figure  6. 


P 

r 

dfree 

Nd{ree 

Idfree 

Generator  Sequences  (Octal) 

1 

1/2 

7 

2 

4 

[23,35] 

2 

1/2 

8 

49/4 

178/4 

[37,20]  [37,35]  [33, 25]  [33, 25] 

Table  5:  The  (2,1,4)  time  invariant  MFD  code  and  the  (2, 1,4,4)  periodic  time  varying  convolu¬ 
tional  code  found  by  Lee. 

Studying  the  performance  of  the  codes  found  by  Palazzo,  it  can  be  seen  from  Figure  4  and  Figure  5 
that  the  periodic  time  varying  codes  show  a  small  performance  improvement  over  the  time  invariant 
MFD  codes.  In  Figure  4  all  the  periodic  time  varying  codes  are  better  than  the  time  invariant  MFD 
code  in  terms  of  Pb  and  all  these  codes  have  smaller  Ndfree  and  Idfr„  than  that  of  the  time  invariant 
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Figure  4:  Simulation  results  of  the  time  invariant  MFD  code  and  the  (4, 1,2,  P)  codes  found  by 
Palazzo. 

MFD  code.  In  Figure  5,  the  (3, 2, 1,2)  periodic  time  varying  code  with  dfree  =  4  has  better  Pb  for 
almost  all  SNR’s  than  the  (3, 2, 1)  time  invariant  MFD  code  with  dfree  =  3. 

However,  from  Figure  3,  the  (2,1,4,P)  codes  found  by  Mooser  do  not  have  better  performance 
than  the  (2,1,4)  time  invariant  MFD  code  even  though  the  (2,1,4,P)  codes  are  better  in  terms 
of  Ndfree  and  Idfree-  Also,  from  Figure  6,  it  can  be  seen  the  the  (2, 1,4,4)  code  with  djree  =  8  has 
almost  the  same  performance  as  the  (2, 1,4)  time  invariant  MFD  code  with  dfree.  Hence,  periodic 
time  varying  codes  with  larger  dfree  than  the  comparable  time  invariant  codes,  or  with  the  same 
dfree  but  smaller  Ndfree  and  /<//ree  are  not  guaranteed  for  better  performance  at  low  and  moderate 
SNR’s  than  the  comparable  time  invariant  codes.  In  fact,  the  distance  spectrum  is  the  main  factor 
in  determining  the  error  probability  when  Viterbi  decoding  is  used  for  a  convolutional  code. 

The  code  search  in  this  thesis  has  two  goals.  First,  to  try  to  demonstrate  the  argument  of  Johan- 
nesson  [4]  that  the  Heller  bound  can  be  applied  to  time  varying  codes  but  the  Griesmer  bound  can 
not.  FVom  Figure  7,  it  can  be  seen  that  there  is  a  gap  between  the  Heller  bound  and  the  Griesmer 
bound  for  memories  5,  12,  and  so  on.  For  example,  the  Heller  bound  for  rate  1/2  memory  5  time 
varying  codes  is  9  while  the  Griesmer  bound  is  8.  A  search  for  (2, 1,5,P)  periodic  time  varying 
codes  with  dfree  =  9  is  performed  in  this  thesis.  Memory  5  is  the  smallest  for  rate  1/2  codes 
where  the  Griesmer  bound  does  not  agree  with  the  Heller  bound.  The  second  goal  of  the  search 
is  to  find  new  codes  of  rate  1/2  and  memory  greater  than  5  with  larger  free  distance  than  the 
comparable  time  invariant  MFD  codes.  The  first  interesting  class  of  codes  is  the  (2, 1,7,  P)  codes 
with  dfree  11. 


15 


Figure  5:  Simulation  results  of  the  time  invariant  MFD  code  and  the  (3,2, 1,2)  codes  found  by 
Palazzo. 

3.3  Search  Technique  and  Result 

So  far  there  has  been  little  success  in  finding  good  convolutional  codes  in  terms  of  large  free  distance 
by  algebraic  methods.  Most  codes  are  found  by  computer  search  [5,  6,  7,  8,  9].  A  computer  aided 
code  search  based  on  the  FAST  algorithm  [10]  for  computing  the  distance  spectrum  of  time  invariant 
convolutional  codes  is  used  in  this  thesis.  With  a  few  modifications,  FAST  can  be  used  to  find  the 
free  distance  and  the  distance  spectrum  of  a  periodic  time  varying  code. 


3.3.1  The  FAST  Algorithm  for  Time  Invariant  Convolutional  Codes 


The  FAST  algorithm  for  finding  the  distance  spectrum  was  developed  by  Cedervall  and  Johannesson 
[10,  4].  It  utilizes  the  distance  profile  to  dramatically  reduce  the  search  to  relatively  small  number 
of  codewords.  Recall  that  the  distance  profile  is  the  first  m  + 1  orders  of  the  column  distance.  This 
algorithm  can  easily  be  modified  to  find  the  free  distance  of  a  time  invariant  code  and  to  determine 
the  number  of  nonzero  information  bits  and  path  length  of  a  codeword. 

Let  ndfrex+i  denote  the  number  of  paths  of  weight  dfree  +  i  which  depart  from  the  all  zero  path 
at  the  root  node  in  the  code  trellis  and  do  not  reach  the  zero  state  until  their  termini.  That  is, 
ndfree+i  Is  the  number  of  codewords  with  weight  dfree  +  i.  ndfree+i  is  called  the  (i  +  l)th  spectral 
component  The  sequence 

ndjree+i'  *  =  0,  1,  2,  ..., 

is  called  the  distance  spectrum  of  the  code. 
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Figure  6:  Simulation  results  of  best  time  invariant  codes  and  (2, 1,4,- 4)  codes  found  by  Lee. 


To  compute  the  distance  spectrum  for  a  time  invariant  convolutional  code  encoded  by  a  feedforward 
encoder,  the  FAST  algorithm  exploits  the  linearity  of  the  code  and  counts  the  number  of  weight 
d  codewords  with  uo  /  0  diverging  from  the  zero  state  and  remerging  for  the  first  time  at  the 
zero  state.  As  before,  ut  and  vt  denote  the  input  and  output  at  time  instant  t,  respectively.  This 
algorithm  performs  a  sequential  search  in  the  code  trellis  to  find  a  sequence  of  weight  d  and,  hence, 
it  is  essentially  a  progressive  trellis  search.  For  simplicity,  the  discussion  is  limited  to  the  case  of 
rate  1/2.  The  extension  to  rate  k/n  is  straightforward. 

For  an  arbitrary  node  at  depth  t  in  the  code  trellis,  let  where  W *  >  cf,  be  the  accumulated 
total  weight  of  a  certain  path  produced  by  t  —  1  inputs.  A  subtrellis  is  defined  as  the  remaining 
forward  trellis  connecting  to  this  node.  For  each  subtrellis  stemming  from  this  node  with  weight 
d,  when  it  terminates  at  the  zero  state  it  has  to  spend  exactly  weight  (d  —  W *).  Hence,  each  node 
is  labeled  with  the  current  state  of  the  encoder  and  the  remaining  weight ,  that  is,  W  =  d  —  W 
When  the  node  is  at  the  zero  state  and  its  weight  is  W  =  0,  a  path  with  weight  d  is  found. 

Let  St  =  (ut_i  Ut_ 2  *  •  *  Ut— m)  denote  the  state  of  the  encoder  and  let  ut  =  0  for  t  <  0.  This 
definition  is  true  only  for  feedforward  encoders,  since  at  time  t  the  content  of  the  ith  stage  of  a  shift 
register  is  just  the  input  at  time  t  —  i.  For  feedback  encoders*,  the  state  depends  not  only  on  the 
current  state,  but  also  on  the  combination  of  the  previous  inputs  and  the  output  of  the  encoder. 
Therefore,  the  state  of  a  feedback  encoder  is  not  (u*_i  u*_2  •  *  *  There  are  two  successor 

states  from  each  state,  namely,  S®+1  =  (0,  u*_i  •••  u*-m-i)  and  S*+1  =  (1,  ut_i  •••  Ut_m_i), 
corresponding  to  u t  equal  to  zero  and  one,  respectively.  To  simplify  the  notation,  t  will  be  ignored 
in  the  sequel.  Let  w°  and  wl  denote  the  branch  weights  stemming  from  a  node.  Using  these  branch 
weights  together  with  the  current  node  weight  W,  the  two  successor  node  weights  can  be  expressed 

'encoders  with  feedback  connections 
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Figure  7:  Heller  bound,  Griesmer  bound  and  dfree  of  the  time  invariant  MFD  codes, 
as 

W°  =  W  -w°  and  W1  =  W-w1. 


This  is  illustrated  in  Figure  8. 

W° 


Figure  8:  Successor  nodes  at  time  t. 


When  searching  for  a  path  in  the  code  trellis  with  a  given  weight,  a  subtrellis  will  be  explored  if  and 
only  if  either  of  the  new  node  weights,  W°  or  W1,  is  nonnegative  and  if  the  state  of  the  new  node, 
S°  or  S1,  differs  from  the  zero  state.  A  negative  node  weight  means  that  paths  stemming  from 
the  current  node  have  weight  smaller  than  d  and  therefore  need  not  be  extended.  When  the  state 
of  the  new  node  is  zero,  it  means  the  path  has  reached  its  terminus  and  further  extension  is  not 
necessary.  Without  loss  of  generality,  priority  is  given  to  the  zero  branch  whenever  a  selection  has 
to  be  made  between  two  new  possible  nodes.  With  this  background,  a  straightforward  algorithm 
for  determining  the  number  of  paths  of  a  given  weight  d  can  be  formulated  as  follows. 

Let  the  root  node  be  defined  as  the  node  with  state  S  =  (0  0  ...  0)  and  weight  W  =  d.  Correspond¬ 
ing  to  the  input  zero  and  one,  the  two  successor  nodes  are  the  node  with  state  S°  =  (0  0  ...  0)  and 
weight  W°  =  d,  and  the  node  with  S1  =  (1  0  ...  0)  and  weight  W1  =  d  -  dCy o,  where  dc> o  is  the 
0th  order  column  distance.  For  rate  1/2  codes,  there  is  w1  =  dc< o-  Selecting  the  first  node  results 
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in  the  all  zero  path  search  or  the  replica  of  the  search  when  the  second  node  is  selected.  Therefore, 
the  search  starts  at  the  second  node  with  state  S1  =  (1  0  ...  0)  and  weight  W  =  d  -  dc,o  and 
moves  forward  in  the  code  trellis.  If  the  state  of  current  node  is  S  =  (0  0  ...  1)  and  the  new  node 
corresponding  to  zero  input  has  weight  W°  =  0  and  state  S°  =  (0  0  ...  0),  then  the  path  counter 
rid  is  increased  by  1.  rid  is  used  to  keep  track  of  the  number  of  paths  with  weight  d .  If  the  new 
node  weight  is  negative  or  if  the  new  node  is  in  its  zero  state,  then  the  search  moves  backward. 
Thus,  all  the  previous  information  symbols  have  to  be  stored  so  that  the  algorithm  can  move  back 
until  a  new  “one”  branch  with  a  nonnegative  node  weight  is  found.  The  search  then  moves  forward 
again.  A  stop  condition  appears  when  the  search  reaches  the  root  node. 

This  basic  algorithm  is  very  time  consuming.  Through  an  example,  the  efficiency  of  the  FAST 
algorithm  will  be  explained.  A  (2, 1,3)  time  invariant  code  is  used  for  this  example.  The  generator 


Figure  9:  An  example  of  a  weight  dfree  path 

sequences  of  this  code  are  [g^\g^]  =  [74,54].  It  has  a  distance  profile  of  d  =  [2,  3,  3,  4].  In  Figure 
9,  the  part  of  its  trellis  which  contains  the  weight  dfree  =  6  path  is  shown.  This  path  corresponds  to 
the  information  sequence  U[q,  5)  =  (1  1  0  0  0)  and  to  the  encoded  sequence  V[q,  5)  “  (H  01  01  00  11). 
Since  the  column  distance  is  the  minimum  of  the  Hamming  weights  of  the  paths  with  uo  =  1,  the 
distance  profile  can  be  used  as  a  lower  bound  on  the  decrease  of  the  node  weight  along  the  path. 
In  general,  the  distance  profile  gives  the  minimum  accumulated  path  weights  for  each  of  the  first 
m  +  1  time  units.  Since  the  node  weight  is  computed  by  the  total  weight  minus  the  accumulated 
path  weight  and  the  accumulated  path  weight  is  lower  bounded  by  the  distance  profile,  the  decrease 
of  the  node  weight,  which  is  the  total  weight  minus  node  weight,  is  lower  bounded  by  the  distance 
profile. 

For  example,  at  time  unit  1,  from  Figure  9,  the  node  weight  is  =  4  due  to  the  branch  weight 
w 1  =  2.  The  difference  between  the  total  weight  d  =  6  and  the  node  weight  4  is  2.  Hence,  the 
node  weight  decreases  by  2.  At  time  unit  2,  the  node  weight  goes  down  to  W  =  3  because  of  the 
branch  weight  w *  =  1.  Therefore,  the  node  weight  decrease  by  3.  The  difference  between  the  node 
weight  and  the  total  weight  for  time  units  3  and  4  are  4  and  4,  respectively.  The  decrease  of  the 
node  weights  for  the  four  units  is  [2,  3,  4,  4].  d  is  a  tight  lower  bound  for  time  units  1,  2  and  4. 

Before  further  discussion  of  the  algorithm,  let  us  first  look  at  how  to  move  backward  in  the  trellis. 
For  the  two  encoders  shown  in  Figure  10,  they  have  the  same  connection  except  the  input  of  encoder 
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(a)  (b) 

Figure  10:  Encoders  of  a  (2,1,2)  time  invariant  code  with  left  side  input  and  right  side  input. 


(a)  enters  from  left  side  while  the  input  of  encoder  (b)  enters  from  right  side.  The  trellises  of  these 
two  encoders  are  shown  in  Figure  11.  Comparing  these  two  trellis,  it  is  easy  to  see  that  the  two 
encoders  share  the  same  trellis  structure  with  the  same  output  branch  label.  The  only  difference  is 
that  one  moves  from  left  to  right  (forward)  and  the  other  one  moves  from  right  to  left  (backward). 
Therefore,  in  order  to  travel  in  the  opposite  direction  in  a  trellis,  the  input  should  go  into  the 
encoder  from  the  other  side.  For  example,  let  the  u[0,6)  =  (10100).  The  output  of  encoder  (a) 
is  V[0)6)  =  (11  11  10  11  01)  and  the  output  of  encoder  (b)  is  V[0]6)  =  (01  11  10  11  11).  Note  the 
order  of  theses  two  outputs  are  reversed  due  to  the  opposite  travel  direction.  These  results  can  be 
obtained  by  following  the  two  red  paths  in  Figure  11. 

If  the  search  traverses  this  path  in  the  opposite  direction,  it  will  get  the  same  total  weight  but 
different  node  weights.  The  node  weight  can  be  explained  as  the  weight  that  needs  to  be  spent  on 
the  remainder  of  this  path  (backward).  It  is  the  decrease  of  node  weight  when  moving  forward. 
Thus,  the  node  weights  are  lower  bounded  by  the  distance  profile  when  moving  backward  in  the 
trellis.  In  Figure  12,  the  distance  profile  is  used  as  a  lower  bound  on  the  node  weight  along  the 
path.  Notice  that  if  a  node  has  weight  less  than  this  bound,  then  every  path  leading  backward  to 
the  zero  state  will  give  a  negative  node  weight  at  the  root  node.  For  example,  if  the  node  weight  in 
state  (001)  is  less  than  dc<3  =  dc<m  -  4,  this  node  will  not  be  extended  when  traversing  the  trellis 
backward.  This  is  because,  from  the  state  (001)  in  that  path,  at  least  weight  4  needs  to  be  spent 
for  the  next  4  steps  to  reach  the  all  zero  state.  More  generally,  the  weight  of  a  backward  path 
stemming  from  a  node  in  state  S  7^  (00 ...  0),  starting  with  a  one  branch  and  eventually  leading  to 
the  root  node  (zero  state),  is  lower  bounded  by  dC(fn. 

The  use  of  the  distance  profile  as  a  lower  bound  on  the  node  weights  works  for  every  path  from 
the  end  to  the  root  node.  Moving  backward  from  state  S,  two  possible  states  S-0  and  S-1  will 
be  reached,  where  The  minimum  weight  of  the  backward  paths  stemming  from  the  states  S  ~°  and 

S  =  (? . . .??1  0_00  ) 

i-i zeros 

S~°  =  (?...?!  00..  .00) 

l zeros 

S"1  =  (?...?100_01) 

{-1  zeros 

S-1  are  lower  bounded  by  1  and  respectively.  For  the  backward  paths  stemming 
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Figure  11:  Trellis  of  the  two  encoders  in  Figure  10. 

from  the  state  S_0,  at  least  m  -  l  more  steps  are  needed  for  the  paths  to  reach  the  zero  state, 
which  means  at  least  weight  dc,m— i— l  needs  to  be  spent.  For  the  backward  paths  stemming  from 
the  state  S— * ,  at  least  m  more  steps  are  needed  for  the  paths  to  get  to  the  zero  state  and  at  least 
weight  dCtm- 1  needs  to  be  spent.  Therefore,  dc>m_j_i  and  dc,m_i  become  lower  bounds  for  the  node 
weights  at  states  S~°  and  S-1. 

Moving  backward  in  the  trellis  is  not  convenient  because  it  requires  input  from  the  other  side  of 
the  encoder.  In  fact,  moving  forward  in  the  trellis  generated  by  the  reversed  generator  sequences  of 
the  original  generator  sequences  is  equivalent  to  moving  backward  in  the  trellis  of  the  original  code. 
Figure  13  shows  the  encoder  with  a  reversed  generator  sequences  of  the  encoder  (a)  in  Figure  10. 
Its  trellis  is  shown  in  Figure  14.  Let  u[0,6)  =  (10100).  The  output  of  this  “reversed”  encoder  is 
V[0  6)  =  (01  11  10  11  11),  which  is  the  same  as  the  output  of  encoder  (b)  in  Figure  10  whose  trellis 
moves  backward.  Therefore,  instead  of  moving  backward  in  the  trellis,  the  generator  sequences  can 
be  reversed  and  the  search  can  move  forward  in  the  corresponding  trellis  and  the  distance  profile 
(of  the  original  generator  sequences)  can  be  used  to  effectively  limit  the  part  of  the  trellis  that  must 
be  explored. 


Figure  12:  The  weight  d free  path  traversed  backward 


Figure  13:  Encoder  of  a  (2,1,2)  time  invariant  code  with  a  reversed  generator  sequences  of  encoder 
(a)  shown  in  Figure  10. 

The  FAST  algorithm  can  now  be  described  as  follows.  Suppose  the  distance  spectrum  of  code  with 
generator  sequences  [g^,g^]  and  distance  profile  d  is  going  to  be  determined.  Let  [g^,g^] 
denote  the  generator  sequence  for  the  reversed  time  invariant  convolutional  code.  To  calculate  the 
ith  spectral  component,  start  at  state  S  =  (10 ...  0)  with  weight  W  =  dfree  +  i  —  dc,o  hi  the  code 
trellis  generated  by  [g^,g®].  This  weight  will  be  reduced  by  the  weight  of  the  branches  that  the 
search  traverses  when  the  code  trellis  is  searched  for  nodes  with  both  node  weight  and  state  equal 
to  zero.  For  each  explored  node,  the  column  distance  dC)fn_/_i  or  dc,m-i  will  be  used  to  lower 
bound  the  weight  of  any  path  leading  to  a  zero  state.  If  the  current  weight  is  less  than  this  bound, 
a  nonzero  state  with  zero  or  negative  weight  will  always  by  reached.  Hence,  it  is  only  necessary  to 
extend  a  node  if  the  node  weight  is  larger  than  or  equal  to  this  bound. 

If  both  successor  nodes  are  achievable,  then  the  “zero”  branch  is  selected  and  the  “one”  branch 
node  (state  S1  and  weight  W1)  is  pushed  on  a  stack.  Thus,  the  weight  of  this  node  will  not  be 
calculated  twice  while  moving  back.  The  complete  FAST  algorithm  is  described  as  follows  and 
the  flow  chart  is  shown  in  Figure  15.  Notice  that  wi  is  calculated  using  the  reversed  generator 
sequences. 


FI  (Initialize)  Set  l «—  1,  na  <—  0,  W  <—  d  —  dCi o,  and  S  =  [1,  0,  •  •  • ,  0]. 

F2  (Next  nodes)  Calculate  S°,  S1,  W°  and  W1.  If  l  <  m,  go  to  F6. 

F3  (Return  to  zero)  If  W°  =  0,  set  nd_wo  <—  n^_ w°  +  !• 
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Figure  14:  Trellis  of  the  encoders  in  Figure  13. 


F4  (Forward  on  “one”  branch?)  If  W1  <  dc,m-i  or  W  <  dcm,  go  to  F5.  Otherwise,  select  one 
branch  node  and  set  l  <—  1.  Go  to  F2. 

F5  (From  stack?)  If  stack  is  empty,  the  algorithm  terminates  with  the  result  in  nd.  Otherwise, 
select  the  node  from  the  stack  and  set  l  *—  1.  Go  to  F2. 

F6  (Forward  on  “zero”  branch?)  If  W°  <  dCfm-l- 1,  go  t0  F4. 

F7  (Save  “one”  branch  node?)  If  W1  >  dc,m_i  and  W  >  dc,m,  save  the  “one”  branch  node.  In 

any  case,  select  ‘  ‘zero”  branch  node  and  set  l  <—  l  +  1.  Go  to  F2. 

The  algorithm  described  above  assumes  the  free  distance  is  known.  This  algorithm  can  be  modified 
to  find  the  free  distance  of  a  code  when  it  is  unknown  as  follows. 

Initialize  d  to  be  sufficiently  large,  e.s.,  1000.  If  d  >  dfree,  the  search  will  reach  a  zero  state  with  a 
positive  node  weight  W°,  i.e.,  d yree  is  at  most  d  —  W°.  Hence  all  node  weights  in  the  stack  can  be 
reduced  by  W°  (adjust  stack)  and  the  path  counter  reset  to  1.  When  the  search  stops,  d  =  dfree. 
The  modifications  of  F3  in  the  algorithm  are  described  below  and  the  modified  flow  chart  is  shown 
in  Figure  16. 

F3  (Return  to  zero)  If  W°  =  0,  set  nd  <-nd+l;  else  if  W  >  0,  set  d  <-  d- W°,  W1  W1  -  W° , 

W  <—  W  -  W°,  nd  <—  1  and  adjust  stack. 


3.3.2  Modification  to  FAST  for  Periodic  Time  Varying  Codes 


In  order  to  make  FAST  work  for  periodic  time  varying  codes,  there  are  three  necessary  modifications. 
First,  since  one  periodic  time  varying  encoder  consists  of  P  time  invariant  encoders  (as  shown  in 
Figure  1),  all  the  generator  sequences  of  the  P  time  invariant  encoders  have  to  be  reversed  instead 
of  just  one.  Second,  the  distance  profile  used  must  be  modified  to  be 


dp 


c,0>  <1.  •  •  •  ’  ^c,m] 
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Figure  15:  Flow  chart  of  the  FAST  algorithm. 


=  [mind£0(j),  minc^(1(j),  . . • ,  min  d£m(  j)]  j  =  1,  2,  . . . ,  P 

0  3  0 

Third,  the  time  invariant  code  Cj  used  at  each  time  instant  must  be  recorded. 


Recall  that  for  time  invariant  codes,  moving  forward  in  the  trellis  generated  by  the  reversed  code 
is  equivalent  to  moving  backward  in  the  trellis  of  the  original  code.  This  is  also  true  for  periodic 
time  varying  codes.  However,  moving  backward  in  the  trellis  of  a  periodic  time  varying  code 
C1C2  •  •  •  Cp-iCp  is  equivalent  to  moving  forward  in  the  trellis  of  CpCp-i  ■  •  ■  C<2,C\.  A  counter  j  is 
introduced  in  the  algorithm  to  keep  track  of  the  current  time  invariant  code. 


Let  w°(j)  and  wl(j)  denote  the  branch  weight  generated  by  code  Cj  with  input  “0”  or  “1”,  respec¬ 
tively.  Note  that  in  order  to  find  all  possible  paths  of  weight  d  for  periodic  C1C2  •  •  -Cp-iCp  code, 
FAST  has  to  be  run  P  times  to  check  the  P  circular  shifts,  e.s.,  C2C3  —  CpC\.  Steps  FI  and  F2 
axe  modified  as: 


FI  (Initialize)  Set  i  <—  1,  l  <—  1,  rid  «—  0,  W  <—  d  —  dPcQ,  and  S  =  [1,  0, •  •  • ,  0]. 

F2  (Next  nodes)  Calculate  j  =  (j+P-1)  mod  P,  S°,  S1,  W°  =  W-t v°(j)  and  W1  =  W-w^j). 
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Figure  16:  Flow  chart  of  the  FAST  algorithm  when  dfree  is  unknown. 


If  l  <m,  go  to  F6. 


Figure  17  shows  the  flow  chart  of  the  modified  FAST  algorithm  for  periodic  time  varying  codes. 
Figure  18  shows  the  flow  chart  of  FAST  for  periodic  time  varying  codes  when  d/ree  is  unknown. 


3.3.3  Search  Technique 

Before  running  the  code  search  program,  the  ensemble  of  periodic  time  varying  codes  needs  to  be 
generated.  First,  the  ensemble  of  (2,l,m)  time  invariant  codes  is  generated.  For  (2,l,m)  codes, 
there  are  two  generator  sequences,  and  .  Each  generator  has  m  +  1  bits  that  can  be  0  or 
“1”.  There  are  2m+1  possible  values  for  each  generator  sequence.  Therefore,  there  is  a  total  2  x  2m+1 
possible  codes.  However,  time  invariant  codes  generated  in  this  way  may  not  have  memory  m.  For 
example,  to  generate  a  (2, 1, 2)  code,  the  g  =  [4, 2]  =  [110, 010]  is  a  valid  combination,  but  no  longer 
a  valid  (2, 1,2)  code.  Because  the  last  position  of  and  g^  are  all  “0”.  To  solve  this  problem, 
the  following  restriction  is  imposed  on  g(1).  The  mod  2  adder  has  connections  to  the  present  input 
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Figure  17:  Flow  chart  of  the  modified  FAST  algorithm. 


and  to  the  oldest  input,  that  is,  g^  has  “l’s”  in  the  first  and  last  positions.  Therefore,  there  are 
2m_1  possible  value  for  gM.  g^2)  is  constructed  without  the  above  restriction  and  still  has  2m+1 
values.  Next,  periodic  time  varying  codes  are  generated  based  on  the  time  invariant  code  ensemble. 
If  the  period  is  2,  then  two  time  invariant  code  ensembles  are  chosen.  The  period  2  time  varying 
codes  are  generated  by  make  all  possible  combinations  of  these  two  ensembles.  To  generate  period 
3  time  varying  codes,  pick  3  time  invariant  code  ensembles  and  do  the  combinations.  For  period  P 
codes,  P  time  invariant  code  ensembles  are  used  to  make  the  combinations. 

As  can  be  seen  from  the  above  discussion,  the  number  of  possible  codes  increase  exponentially  with 
the  period.  Eliminating  the  equivalent  codes  from  the  code  ensemble  becomes  very  important  in 
order  to  make  the  code  search  feasible.  By  studying  periodic  time  varying  codes,  a  criterion  is 
found  to  get  rid  of  a  large  amount  of  codes. 

First,  consider  a  period  2  code  in  the  order  of  C1C2,  then  the  code  in  the  order  C%Ci  will  be 
the  same  code  as  C1C2.  Next,  for  a  period  3  code  in  the  order  C1C2C3,  the  equivalent  codes  are 
C2C3C1  and  C3C1C2.  For  the  period  3  code,  the  last  two  codes  are  just  circular  shift  of  the  first 
one.  FVom  the  discussion  of  the  FAST  algorithm,  the  algorithm  itself  checks  all  the  circular  shifts 
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Figure  18:  Flow  chart  of  the  modified  FAST  algorithm  when  dfree  is  unknown. 

and  these  three  codes  have  the  same  set  of  circular  shifts.  Hence,  the  last  two  are  equivalent  to 
the  first  one.  The  same  principle  can  be  use  to  find  equivalent  codes  for  any  period.  That  is,  codes 
that  are  generated  by  a  circular  shift  right  or  left  j  times,  j  =  0,...,P,  are  equivalent  codes.  Using 
this  method,  about  25%  codes  can  be  eliminated. 

As  mentioned  before,  the  FAST  algorithm  performs  a  progressive  trellis  search.  If  the  search  goes 
into  a  path  that  can  not  gain  any  weight,  it  will  keep  going  and  never  stop.  This  happens  when  a 
catastrophic  encoder  is  being  searched.  Since  the  generated  code  ensemble  has  a  lot  catastrophic 
codes,  these  catastrophic  codes  should  be  eliminated  before  the  search  starts  or  the  FAST  algorithm 
needs  to  be  modified  to  detect  the  catastrophic  code.  A  discussion  on  checking  the  catastrophic 
condition  will  be  presented  in  Chapter  4  based  on  a  transition  matrix  analysis.  This  can  be  used 
to  eliminate  the  catastrophic  encoders  before  the  search  starts.  A  trivial  modification  to  FAST  to 
identify  a  catastrophic  encoder  is  to  add  a  counter  to  keep  track  of  the  path  weight.  If  the  weight 
does  not  increase  for  a  long  time,  then  the  search  should  stop. 
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3.3.4  Search  Results 


The  search  for  (2,1,5,P)  codes  with  P  -  1,  2,  3,  and  4  are  exhaustive.  The  search  for  (2, 1,5,5) 
and  (2, 1, 7, 2)  codes  are  not  exhaustive  because  of  the  huge  code  ensemble.  For  these  two  codes, 
codes  are  picked  randomly  from  the  overall  ensemble  of  each  code. 


Code 

Ci 

c2 

dfree 

Hfl 

1 

(367  255) 

(323  247) 

n 

9 

19 

2 

(335  257) 

(323  247) 

n 

13/2 

47/2 

3 

(375  267) 

(323  247) 

li 

17/2 

69/2 

Table  6:  (2, 1,7,2)  Codes  With  Free  Distance  11. 

In  Table  6,  all  of  the  (2, 1,7, 2)  codes  with  free  distance  11  are  listed.  The  C\  and  C2  column 
headings  represent  the  two  time  invariant  codes  used  in  the  period  2  time  varying  code.  The 
generators  are  written  in  an  octal  form.  For  each  code,  Ndfree  and  Idfree  816  also  presented. 


Code 

d  =  10 

11 

12 

13 

14 

15 

16 

17 

D 

Nd 

0 

9 

33/2 

14 

45 

160 

363 

1517/2 

Id 

0 

38 

mm 

91 

338 

1352 

3312 

15001/2 

2 

Nd 

0 

13/2 

mm 

26 

54 

255/2 

639/2 

791 

Id 

0 

47/2 

83 

165 

375 

2073/2 

2857 

7707 

3 

Nd 

0 

17/2 

16 

20 

56 

136 

655/2 

794" 

Id 

0 

69/2 

84 

138 

839/2 

1145 

3002 

7843 

MFD 

Nd 

1 

6 

12 

26 

52 

132 

317 

730 

mm 

2 

HI 

■EE1 

6748 

Table  7:  (2, 1,7, 2)  Distance  Spectrum.  Notice  that  the  first  distance  with  nonzero  spectrum  is 

d free • 

Table  7  lists  the  distance  spectrum  of  all  (2, 1, 7, 2)  codes  with  free  distance  11,  from  d  =  10  to 
d  =  17.  Nd  is  the  total  number  of  codewords  of  weight  d  divided  by  period  P  and  Id  is  the  total 
information  weight  of  all  codewords  of  weight  d  divided  by  period  P.  The  distance  spectrum  of  the 
(2, 1, 7)  time  invariant  MFD  code  with  g  =  [712, 476]  and  free  distance  10  are  also  listed  in  Table  7. 
Compared  with  this  time  invariant  MFD  code,  the  three  (2, 1,7,2)  codes  are  good  in  terms  of  free 
distance.  However,  except  for  distance  10,  the  spectrum  of  the  time  invariant  MFD  code  is  better 
(smaller)  than  all  three  periodic  time  varying  codes. 

Unfortunately,  after  an  extensive  search  of  (2, 1,5,  P)  codes  with  P  =  1,2, 3, 4,  and  5,  a  code  with 
dfree  =  9  was  not  found.  Since  the  ensemble  of  codes  with  period  greater  than  5  becomes  too 
large,  these  codes  were  not  searched  in  this  thesis.  It  is  still  possible  that  a  (2, 1, 5,  P)  periodic  time 
varying  codes  with  dfree  =  9  exists. 
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3.4  Simulation  Results 


In  this  section,  simulation  results  for  the  three  periodic  time  varying  codes  with  dfree  —  11  are 
presented.  These  results  are  compared  with  the  simulation  results  for  the  (2,1,7)  time  invariant 
MFD  code  with  g  =  [712,476]  [11]. 

In  these  simulations,  more  than  100  bit  errors  are  counted  for  each  simulation  run  in  order  to  reduce 
the  variance  of  the  estimate.  The  bit  error  rate  is  plotted  versus  signal  to  noise  ratio  (Eb/N0)  in 
all  the  figures.  Here,  Eb  represents  the  energy  per  information  bit  and  N0  is  the  single  sided 
power  spectral  density  of  the  white  noise.  Binary  Phase  Shift  Keying  (BPSK)  signaling  is  used 
over  an  additive  white  Gaussian  noise(AWGN)  channel  and  soft  decisions  are  used  in  the  decoder. 
Figures  19,  20  and  21  show  the  simulation  results  of  the  three  (2, 1,7, 2)  periodic  time  varying 
codes,  respectively. 

An  analytical  BER  given  by 

Pb(E)<j-  £  idPd,  (!) 

K  d=d,  ree 

where  Id  is  the  total  number  of  nonzero  information  bits  on  all  weight  d  paths  and  Pd  is  the 
probability  that  decoder  selects  an  incorrect  path  with  Hamming  distance  d  from  the  correct  path 
is  also  shown  in  the  figures.  For  the  AWGN  channel  with  BPSK  modulation  and  a  soft  decision 

decoder  ,  _ . 

Pd  =  Q  U2dREb/N0J  (2) 

where  Q(‘)  is  defined  as 

In  practice,  the  summation  in  (1)  is  truncated  to  a  finite  number  of  terms  to  obtain  an  approxima¬ 
tion  on  Pb.  In  Figures  19,  20  and  21,  16  terms  are  used.  The  16  Id  terms  used  in  Figure  19  are  |{ 
76,  162,  182,  676,  2704,  6624,  15001,  41446,  112993,  293480,  744716,  1904866,  4920721,  12584340, 
31989423,  81053650  }.  The  16  Id  terms  used  in  Figure  20  are  £{  47,  166,  330,  750,  2073,  5714, 
15414,  39728,  104476,  271160,  688352,  1759408,  4491857,  11429308,  28964105,  73067376  }.  The 
Id  used  in  Figure  21  are  69,  168,  276,  839,  2290,  6004,  15686,  41239,  109246,  281140,  719852, 
1843439,  4688815,  11885233,  30098536,  76072907  }. 

The  simulation  results  show  that  even  though  the  periodic  time  varying  convolutional  codes  have 
larger  free  distance  than  the  time  invariant  codes,  they  do  not  outperform  the  time  invariant  codes 
for  low  and  moderate  Eb/N0' s.  From  Figure  19,  it  can  be  seen  that  the  two  performance  curves  are 
interweaved  with  each  other.  Hence,  it  is  hard  to  say  that  the  time  invariant  code  is  better  than 
the  periodic  time  varying  code  or  the  time  varying  code  is  better  than  the  time  invariant  code. 
This  situation  is  also  true  in  Figures  20  and  21. 

This  can  be  explained  by  using  the  distance  spectrum  together  with  the  union  bound[12].  That 
is,  the  performance  is  determined  by  the  overall  distance  spectrum,  not  only  by  the  free  distance 
term.  The  (2,1,7)  time  invariant  code  has  dfree  =  10  which  is  smaller  than  the  periodic  time 
varying  codes,  but  the  Ndjree  and  the  IdjTee  ®r®  fair ly  small,  only  1  and  2.  Comparing  the  other 
distance  spectrum  terms  with  those  of  the  periodic  time  varying  codes,  this  time  invariant  code  is 


better  for  almost  all  distances.  Therefore,  in  terms  of  distance  spectrum,  the  three  (2, 1, 7, 2)  codes 
are  denser  than  the  time  invariant  code.  The  advantage  the  three  (2, 1,7,2)  periodic  time  varying 
convolutional  codes  gain  from  their  larger  free  distance  is  offset  by  their  relatively  dense  distance 
spectrum. 


Figure  19:  Simulation  of  (2, 1,7,2)  No.l. 


30 


4  Time- Varying  Turbo  Codes 

4.1  Introduction 


Due  to  their  near  Shannon  limit  performance,  turbo  codes  have  attracted  a  great  deal  of  attention 
since  their  discovery  in  1993  [13].  The  BER  performance  of  a  turbo  code  may  be  divided  into  two 
regions.  The  so-called  waterfall  region  in  which  the  BER  drops  rapidly  and  the  error-floor  region 
where  the  BER  drops  at  a  slower  rate.  It  is  desirable  to  have  turbo  codes  that  have  fast-dropping 
waterfalls  and  low  error-floors  when  they  are  decoded  with  the  iterative  decoding  algorithm.  How¬ 
ever,  it  is  challenging  to  design  turbo  codes  that  perform  well  in  both  regions  due  to  conflicting 
design  constraints. 

It  is  well-known  that  the  error-floor  is  caused  by  the  relatively  small  free  distance  of  turbo  codes 
with  pseudorandom  interleavers  [15,  16].  At  moderate  to  high  SNRs,  a  turbo  code  with  a  pseu¬ 
dorandom  interleaver  will  perform  worse  than  conventional  codes  having  large  free  distance.  It  is. 
intuitive  to  use  large  memory  convolutional  component  codes  to  build  turbo  codes  having  large  free 
distances  and  hence  lower  error-floors.  Unfortunately,  the  iterative  decoding  algorithm  for  turbo 
codes  generally  does  not  converge  for  component  codes  with  memory  greater  than  4  [19]  and  thus 
the  performance  in  the  waterfall  region  is  significantly  compromised. 

Although  challenging,  work  has  been  done  to  design  turbo  codes  that  excel  in  both  regions.  One 
example  of  this  is  the  use  of  asymmetric  turbo  codes  [17].  That  is,  turbo  codes  that  have  one  “weak” 
component  code  which  performs  well  in  the  waterfall  region  and  the  other  “strong”  which  gives 
a  large  free  distance.  An  asymmetric  turbo  code  consisting  of  the  component  code  of  the  Berrou 
turbo  code  and  a  recursive  systematic  code  with  a  primitive  feedback  polynomial  was  constructed 
in  [17].  It  turns  out  that  this  code  is  inferior  to  the  Berrou  turbo  code  in  the  waterfall  region,  but 
has  a  much  lower  error-floor.  The  simulated  BER  performance  of  this  code  in  shown  in  Figure  22. 


Another  encoding  scheme  that  combines  “weak”  and  “strong”  together  is  to  use  the  Big-Numerator 
Little-Denominator  (BN-LD)  convolutional  codes  discovered  by  Massey  et  al.  [14].  In  [14],  the  turbo 
code  constructed  from  a  memory  8  BN-LD  convolutional  code  with  recursive  systematic  generator 

matrix  _  _ 

f  1  +  D  +  D7  +  D* 


Gbn-ld(D)  = 


1  +  D  +  D2 


is  shown  to  have  slightly  better  performance  than  the  Berrou  turbo  code  does  in  both  regions. 
Note  that  Gbn-Ld(D)  has  a  large-degree  feedforward  polynomial  and  a  small-degree  feedback 
polynomial.  It  is  conjectured  that  this  feature  enables  the  turbo  code  to  perform  well  in  all  regions. 
Simulation  results  of  the  original  Berrou  turbo  code  and  the  memory  8  BN-LD  turbo  code  are 
shown  in  Figure  22.  As  is  indicated  by  the  figure,  the  memory  8  BN-LD  code  outperforms  the 
Berrou  code  at  all  SNRs. 


As  an  example  to  illustrate  the  tradeoff  in  performance  between  the  two  regions,  the  BER  perfor¬ 
mance  of  a  turbo  code  based  on  memory  6  BN-LD  component  codes  with  generator  matrix 


Gbn-ld(D ) 
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Figure  22:  Simulated  results  of  the  Berrou  turbo  code,  the  memory  8  BN-LD  turbo  code,  and  the 
memory  6  BN-LD  turbo  code  with  18  decoding  iterations.  All  codes  are  rate  1/3  with  random 
interleaver  of  length  16384. 

is  also  plotted  in  Figure  22.  Due  to  its  smaller  memory,  the  decoding  of  the  turbo  code  based  on 
this  code  is  less  complex  than  that  of  the  memory  8  BN-LD  turbo  code  and,  since  it  is  Sveaker 
than  the  memory  8  BN-LD  code,  the  turbo  code  constructed  is  expected  to  perform  better  in  the 
waterfall  region.  The  EXIT  analysis  described  in  [19]  provides  a  useful  tool  for  predicting  the  BER 
performance  of  turbo  codes  in  the  waterfall  region.  EXIT  charts  for  the  component  codes  of  the 
Berrou  code,  the  memory  8  BN-LD  code  and  the  memory  6  BN-LD  code  are  shown  in  Figures  23, 
24,  and  25,  respectively.  The  EXIT  charts  show  that  at  SNRs  greater  than  -0.3  dB  the  iterative 
decoding  algorithm  converges  for  both  the  Berrou  and  the  memory  8  BN-LD  codes,  but  the  memory 
6  BN-LD  turbo  code  converges  for  SNRs  greater  than  -0.4  dB.  As  shown  in  Figure  22,  the  BER 
performance  of  the  memory  6  BN-LD  turbo  code  is  in  fact  better  in  the  waterfall  region  than  both 
the  Berrou  and  memory  8  BN-LD  turbo  codes,  but  its  error-floor  is  the  worst  among  the  three. 

Thus,  by  using  a  weaker  component  code,  the  performance  in  the  waterfall  region  improves  and  the 
decoding  complexity  decreases.  However,  the  performance  in  the  error-floor  regions  is  worsened, 
presumably  due  to  a  decreased  free  distance.  The  question  remains  as  to  whether  or  not  the  waterfall 
performance  of  the  memory  6  BN-LD  code  can  be  retained  while  improving  its  performance  in  the 
error-floor  region.  In  this  paper,  time-varying  component  codes  will  be  used  to  address  this. 
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Figure  23:  EXIT  chart  of  the  Berrou  turbo  code  with  interleaver  of  length  16384. 


Figure  24:  EXIT  chart  of  the  memory  8  BN-LD  turbo  code  with  interleaver  of  length  16384. 


4.2  Time-varying  Component  codes 


A  time-varying  convolutional  code  [20]  is  a  convolutional  code  generated  by  a  convolutional  en¬ 
coder  whose  generator  matrix  changes  with  time.  Denote  the  generator  matrix  of  a  time-varying 
convolutional  encoder  at  time  t  as  Gt(D).  A  time-varying  convolutional  encoder  is  said  to  be  pe¬ 
riodic  with  period  T  if  Gt(D)  =  Gt+r(D)  for  all  *  =  0, 1,  — .  Since  time-invariant  convolutional 
codes  can  be  seen  as  generated  by  periodic  time- varying  convolutional  encoders  with  period  1,  they 
are  a  subset  of  time-varying  convolutional  codes.  It  is  conjectured  that  the  use  of  time-varying 
component  codes,  which  can  consist  of  combinations  of  weak  and  strong  time-invariant  codes,  will 
enable  the  construction  of  turbo  codes  with  better  performance  in  both  regions. 

The  first  time-varying  component  code  considered  is  a  memory  6,  period  2,  rate  1/2  time-varying 
convolutional  code,  denoted  as  PTVCC1,  and  described  in  Table  8.  Obviously,  PTVCC1  is  con- 
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Figure  25:  EXIT  chart  of  the  memory  6  BN-LD  turbo  code  with  interleaver  of  length  16384. 

structed  according  to  the  same  principles  as  in  [14].  That  is,  it  is  a  BN-LD  code.  The  EXIT  analysis 
of  this  code,  shown  in  Figure  26,  indicates  that  the  iterative  decoder  would  start  to  converge  for 
SNRs  greater  than  -0.5  dB  and  would  converge  relatively  fast  for  SNRs  greater  than  -0.4  dB.  De¬ 
note  the  turbo  code  based  on  PTVCC1  with  a  pseudorandom  interleaver  as  TCI.  The  simulation 
results  for  TCI  are  plotted  in  Figure  27.  The  Figure  shows  that  the  BER  performance  of  TCI  is 
about  0.1  dB  better  than  that  of  the  memory  8  BN-LD  turbo  code  in  the  waterfall  region  which  is 
consistent  with  the  EXIT  analysis.  It  also  performs  better  than  the  memory  6  BN-LD  turbo  code 
in  both  the  waterfall  and  error-floor  regions.  However,  the  simulation  also  shows  that  TCI  still 
has  a  much  higher  error-floor,  suggesting  a  smaller  free  distance,  than  both  the  memory  8  BN-LD 
turbo  code  and  the  Berrou  turbo  code. 
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Table  8:  Generator  matrices  of  the  time-varying  component  codes  used  to  build  turbo  codes. 


In  order  to  improve  on  the  performance  of  the  TCI  in  the  error-floor  region,  a  second  period  2 
time-varying  component  code  based  on  memory  6  convolutional  codes  is  constructed.  This  code, 
denoted  by  TC2,  consists  of  a  BN-LD  code  and  a  traditional  high  free  distance  convolutional  code. 
The  parameters  of  this  code  is  given  in  Table  8.  The  idea  behind  this  code  is  that  it  would  combine 
the  “weak”  aspects  of  the  BN-LD  code  and  the  strong  aspects  of  the  high  free  distance  code.  The 
EXIT  analysis  for  TC2  is  shown  in  Figure  28  and  indicates  that  the  iterative  decoder  will  converge 
for  SNRs  above  -0.4  dB.  The  performance  of  the  resulting  turbo  code,  denoted  by  TC2,  based  on 
TC2  is  shown  in  Figure  27.  As  expected  from  the  EXIT  analysis  the  performance  in  the  waterfall 
region  is  0.1  dB  worse  than  TC2.  However,  the  performance  in  the  error-floor  region  is  significantly 
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improved. 


As  mentioned  in  the  introduction,  the  most  common  method  for  improving  performance  in  the 
error-floor  region  is  to  use  spread  interleavers  [18].  Figure  29  shows  the  performance  of  all  five 
turbo  codes  considered  in  this  paper  with  the  same  spread  interleaver  with  a  spreading  factor  of 
20.  Though  the  spread  interleaver  improves  the  error-floor  of  the  all  the  turbo  codes,  the  relative 
performance  remains  unchanged.  It  is  also  noted  that  the  error  floor  of  both  the  turbo  code  with 
memory  8  BN-LD  component  code  and  the  turbo  code  with  PTVCC2  as  component  code  disappear, 
but  the  turbo  code  with  PTVCC2  as  component  code  achieves  this  with  a  lower  decoding  complexity 
due  to  decreased  memory. 


4.3  Conclusion 

It  is  shown  in  this  paper  that  good  turbo  codes  can  be  constructed  using  time- varying  convolutional 
codes.  A  turbo  code  based  on  a  simple  period  2,  memory  6,  time- varying  component  code  was  shown 
to  outperform  the  memory  8  BN-LD  code  of  [14]  with  less  decoding  complexity.  By  combining 
the  weak  properties  of  BN-LD  codes  with  the  strong  properties  of  traditional  high  free  distance 
convolutional  codes,  time-varying  codes  offer  the  possibility  of  improvement  in  both  the  waterfall 
and  error-floor  regions.  Work  is  continuing  on  finding  turbo  codes  based  on  time-varying  component 
codes  that  show  this. 


Figure  26:  EXIT  chart  of  the  TCI  with  interleaver  of  length  16384. 
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Figure  27:  Simulated  BER  performance  of  the  Berrou  turbo  code,  the  memory  8  BN-LD  turbo 
code,  the  memory  6  BN-LD  turbo  code,  TCI,  and  TC2  with  18  decoding  iterations.  All  codes  are 
rate  1/3  with  random  interleaver  of  length  16384. 


Figure  28:  EXIT  chart  of  TC2  with  interleaver  of  length  16384. 
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Figure  29:  Simulated  BER  performance  of  the  Berrou  turbo  code,  the  memory  8  BN-LD  turbo 
code,  the  memory  6  BN-LD  turbo  code,  TCI,  and  TC2  with  18  decoding  iterations.  All  codes  are 
rate  1  /3  with  spread  interleaver  of  length  16384  and  spreading  factor  20. 


5  Joint  Source- Channel  Coding 

5.1  Introduction 


This  report  aims  at  comparing  different  joint  source  and  channel  coding  techniques  in  the  literature. 
Since  a  joint  source  and  channel  coding  system  performs  both  source  coding  and  channel  coding, 
the  performance  evaluation  of  such  a  system  must  be  based  on  how  well  it  compresses  the  data 
given  a  distortion  constraint  as  well  as  how  successfully  it  combats  against  the  channel  errors.  In 
this  study,  8-bit,  monochrome  images  are  taken  as  the  source  and  are  transmitted  over  a  binary 
symmetric  channel  (BSC).  Therefore,  the  performance  of  the  system  is  evaluated  using  the  following 
three  measures: 


1.  The  overall  rate  of  the  system  (taking  both  the  rate  reduction  due  to  source  coding  and 
rate  increase  due  to  channel  coding  into  consideration). 

2.  The  distortion  after  joint  source  and  channel  coding.  To  evaluate  the  distortion  of  the 
images,  the  signal-to-noise  ratio  (SNR)  measure  is  used. 

3.  Probability  of  channel  error  of  the  BSC. 


Given  these  three  factors,  the  goal  of  a  joint  source  and  channel  coding  system  is  to  minimize  the 
distortion  (therefore  maximize  the  SNR)  for  a  given  rate  and  probability  of  channel  error,  or  to 
minimize  the  overall  rate  for  a  given  distortion  and  channel  error  probability. 

The  report  is  organized  as  follows:  In  Section  2,  the  motivation  behind  joint  source  and  channel 
coding  is  explained  from  a  somewhat  information  theoretical  point  of  view.  In  Section  3,  a  classi¬ 
fication  for  different  approaches  to  joint  source  and  channel  coding  is  made,  which  covers  a  brief 
discussion  of  the  significant  contributions  to  the  area.  In  Section  4,  a  joint  source  and  channel 
coding  scheme  that  exploits  the  residual  redundancy  of  subband  coded  images  is  presented.  In 
Section  5,  the  performance  of  a  wavelet  based,  progressive  coding  scheme  is  investigated  for  noisy 
channels.  In  Section  6,  the  performance  of  channel-optimized  vector  quantization  is  investigated. 


5.2  The  Idea  of  Joint  Source  and  Channel  Coding 

In  classical  communication  systems,  the  design  of  the  source  coder  and  channel  coder  have  been 
made  separately.  This  is  due  to  the  fact  that  the  data  compression  does  not  depend  on  the 
channel  and  error  control  coding  does  not  depend  on  the  source  distribution  [22].  Shannon,  in  his 
original  paper  [21]  proved  that  the  two-stage  method  is  as  good  as  any  other  method  of  transmitting 
information  over  a  noisy  channel.  This  result,  known  as  the  separation  theorem ,  has  some  important 
practical  implications.  It  implies  that  we  can  consider  the  design  of  a  communication  system  as 
a  combination  of  two  parts,  source  coding  and  channel  coding.  We  can  design  source  codes  for 
the  most  efficient  representation  of  the  data.  We  can  separately  and  independently  design  channel 
codes  appropriate  for  the  channel.  The  combination  will  be  as  efficient  as  anything  we  could  design 
by  considering  both  problems  together  [22]. 
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However,  as  we  try  to  operate  under  more  and  more  restrictive  conditions,  the  separation  of  source 
and  channel  coders  becomes  impractical  to  implement.  It  has  been  shown  that  the  separation 
does  not  hold  for  all  channels  [23].  Also,  there  are  examples  of  multiuser  channels,  which  are  the 
rhnrmpl  models  for  today’s  wireless  communication  systems,  where  the  decomposition  breaks  down. 
Moreover,  even  for  the  channels  where  the  separation  holds,  the  design  of  an  optimal  source  and 
channel  coder  pair  is  needed  which  is  usually  impractical  to  implement. 

Another  reason  for  why  the  separation  theorem  does  not  hold  in  practice  is  that  it  assumes  that 
the  source  encoder  outputs  an  independent  sequence  for  optimal  channel  coding.  However,  since 
the  source  encoders  of  practical  interest  are  not  optimal,  their  outputs  contain  redundancy.  Also, 
the  channel  coders  are  assumed  to  be  optimal  in  the  sense  that  they  produce  the  source  encoder 
output  at  the  source  decoder  input  with  negligible  distortion  which  is  an  unrealistic  assumption 
due  to  channel  errors  that  remain  uncorrected.  Therefore,  the  non-optimality  of  source  and  channel 
coders  causes  the  separation  axiom  to  breakdown. 

All  the  weaknesses  of  the  separation  theorem  mentioned  above  motivated  researchers  to  find  more 
efficient  ways  of  doing  source  and  channel  coding.  Various  approaches  to  the  solution  of  the  problem 
have  been  developed  and  are  usually  grouped  under  the  general  heading  of  joint  source  and  channel 
coding.  The  next  section  briefly  summarizes  these  approaches. 


5.3  Background 

The  joint  source  and  channel  coding  schemes  in  the  literature  can  be  classified  into  four  categories. 

The  first  class  of  these  schemes  is  the  joint  source  and  channel  coding  schemes,  named  as  such  since 
the  source  and  channel  coding  operations  are  truly  integrated  into  one  coder  structure.  The  most 
important  examples  of  this  category  include  the  work  of  Ancheta  [24]  and  Massey  [25],  the  work 
of  Dunham  and  Gray  [26]  and  Ayanoglu  and  Gray  [27],  who  investigated  the  design  of  joint  source 
and  channel  trellis  coders.  These  studies  are  rather  theoretical  works  and  very  hard  to  implement 
in  practice,  if  not  impossible. 

The  second  class  of  joint  source  and  channel  coding  schemes  are  named  as  concatenated  source 
and  channel  coding  schemes.  In  this  type  of  coders,  known  source  coders  and  known  channel 
coders  are  cascaded  and  an  optimal  rate  allocation  between  the  source  coder  and  the  channel 
coder  is  performed  for  maximum  system  performance.  The  work  in  this  category  [28]-[32]  uses 
known  source  coding  techniques  such  as  different  forms  of  (i.e.,  two-dimensional,  backward  adaptive, 
embedded,  etc.)  differential  pulse  code  modulation  (DPCM),  discrete  cosine  transform  (DCT),  and 
tree  encoding,  and  concatenates  them  with  different  forms  of  (i.e.,  short  constraint  lengthened, 
self-orthogonal,  punctured,  etc.)  convolutional  codes,  and  Hamming  codes.  The  important  issue 
here  is  to  find  the  optimal  allocation  of  the  fixed  rate  between  the  source  coder  and  the  channel 
coder  as  well  as  their  cooperation. 


In  a  third  class,  unequal  error  protection  source  and  channel  coders  are  considered.  This  type 
of  source  and  channel  coding  schemes  make  use  of  the  fact  that  channel  errors  in  different  bits 
cause  different  effects  on  the  final  reconstruction.  Depending  on  the  source  coding  scheme,  errors 
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in  some  of  the  bits  cause  more  distortion  than  others.  Therefore,  the  bits  can  be  classified  as 
important  and  unimportant  bits.  The  main  idea  behind  unequal  error  protection  is  to  heavily 
protect  important  bits  at  the  expense  of  poor  protection  of  the  unimportant  bits,  resulting  in  better 
system  performance.  Work  in  this  field  [33]-[35]  includes  different  source  coding  schemes  such  as 
pulse  code  modulation  (PCM),  subband  coding  as  well  as  different  error  protection  methods. 

The  last  category  in  the  classification  of  the  joint  source  and  channel  coding  schemes  include  the 
constrained  joint  source  and  channel  coders.  The  source  coders  in  this  class  are  modified  to  account 
for  the  presence  of  a  noisy  channel.  In  other  words,  the  source  coders  are  optimized  subject  to  a 
noisy  channel  constraint  [36]- [38].  One  subset  of  constrained  joint  source  and  channel  coders  are 
those  coders  that  make  use  of  the  knowledge  of  source  coding  properties  to  combat  channel  errors. 
These  studies  [39]-[48]  generally  utilize  the  statistical  properties  of  the  source  encoder  output  such 
as  the  residual  redundancy  and  try  to  detect  and/or  correct  channel  errors. 


5.4  Joint  Source  and  Channel  Coding  of  Subband  Coded  Images  Using  Residual 
Redundancy 

In  this  section,  we  investigate  the  performance  of  a  joint  source  and  channel  coding  system  that 
exploits  the  residual  redundancy  at  the  source  encoder  output  to  detect  and  correct  channel  errors. 
The  source  coder  is  a  concatenation  of  a  subband  coder,  a  DPCM  coder  and  a  Huffman  coder.  For 
channel  coding,  a  non-binary  convolution  encoder  is  proposed  which  is  optionally  used  to  perform 
error  correction.  The  channel  output  is  decoded  using  a  variable-length  list  Viterbi  decoder  and 
the  source  decoders.  The  overall  system  diagram  is  shown  in  Figure  30. 


Figure  30:  System  diagram  of  a  joint  source-channel  coding  scheme. 

Natural  sources  such  as  images  usually  have  low  pass  characteristics.  Therefore,  most  of  the 
information  in  these  sources  tend  to  be  in  the  low  frequency  bands.  Subband  coding  is  a  source 
coding  scheme  that  decomposes  the  source  into  its  subbands,  and  codes  each  subband  according  to 
its  information  content.  One  of  the  most  commonly  used  ways  of  looking  at  the  information  content 
of  a  subband  is  to  calculate  its  energy.  A  subband  that  carries  higher  energy  than  other  subbands 
has  greater  information  content.  Therefore,  a  reasonable  way  of  coding  the  subbands  of  an  image 
would  be  to  allocate  more  bits  to  the  subbands  with  higher  energy  and  allocate  fewer  bits  to  the 
subbands  with  lower  energy.  The  problem  of  allocating  bits  to  subbands,  namely  bit  allocation,  is 
one  of  the  chnllpnging  problems  in  source  coding  and  a  number  of  different  bit  allocation  schemes 
have  been  proposed  in  the  literature.  These  schemes  aim  at  minimizing  the  overall  distortion 
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subject  to  a  given  rate  constraint. 

The  subband  decomposition  for  images  is  often  carried  out  in  the  following  manner:  First,  the  rows 
of  the  image  are  filtered.  Usually,  two  types  of  decomposition  filters  are  used:  low-pass  and  high- 
pass.  After  filtering  the  rows  using  low-pass  and  high-pass  filters,  we  end  up  with  two  subbands 
of  the  image:  low  band  and  high  band.  Next,  the  outputs  of  the  filters  are  subsampled.  The 
justification  for  the  subsampling  is  the  Nyquist  rule  which  states  that  twice  as  many  samples  per 
second  as  the  range  of  frequencies  suffice  for  perfect  reconstruction.  Since  after  filtering,  the  range 
of  frequencies  for  each  subband  is  halved  as  compared  to  the  original  image,  we  need  only  half  of 
the  samples  at  the  output  of  each  filter.  Therefore,  a  2:1  subsampling  can  be  done  without  loss  of 
any  information. 

After  low  and  high  pass  filtering  of  the  rows  and  subsampling,  exactly  the  same  operations  are 
performed  on  the  columns  for  both  low  and  high  bands.  The  two-stage  filtering  operation  on  the 
rows  and  columns  is  equivalent  to  a  two-dimensional  filtering.  After  filtering  and  decimation,  four 
subbands  or  subimages  are  obtained.  The  subimage  obtained  by  low-pass  filtering  the  rows  and 
columns  is  called  the  low-low  (LL)  image.  The  subimage  obtained  by  low-pass  filtering  the  rows 
and  high-pass  filtering  the  columns  is  called  the  low-high  (LH)  image.  The  subimage  obtained  by 
high-pass  filtering  the  rows  and  low-pass  filtering  the  columns  is  called  the  high-low  (HL)  image. 
Finally,  the  subimage  obtained  by  high-pass  filtering  the  rows  and  columns  is  called  the  high-high 
(HH)  image.  Since  subsampling  is  performed  after  each  filtering  operation,  an  image  of  dimension 
N  x  N  results  in  four  subimages  of  dimensions  f  xf.  The  subband  coding  procedure  has  been 
depicted  in  Figure  31. 


Figure  31:  System  diagram  of  a  subband  coding  scheme. 

One  of  the  most  important  design  problems  in  subband  coding  is  the  filter  implementation.  In  the 
literature,  there  are  a  number  of  filter  pairs  (low-pass  and  high-pass)  proposed  for  subband  coding. 
These  filters  are  often  realized  as  causal,  finite  impulse  response  (FIR)  filters.  Another  type  of  filter 
often  used  in  subband  coding  perform  wavelet  decomposition  in  terms  of  FIR  filters. 

In  the  proposed  scheme,  after  the  subband  decomposition  and  the  decision  for  allocating  the  number 
of  bits  to  each  subimage  has  been  made,  the  subimages  are  encoded  using  DPCM  which  is  a 
differential  coding  scheme  where  the  difference  values  to  be  quantized  and  coded  are  minimized 
using  linear  prediction.  Next,  the  output  of  the  DPCM  encoder  is  further  encoded  using  Huffman 
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coding.  Huffman  coding  is  a  lossless  variable-length  coding  scheme,  therefore  the  output  of  our 
concatenated  source  coder  has  variable-length  codewords. 

The  test  image  used  throughout  the  simulations  is  the  Sena  image  shown  in  Figure  32.  The  subband 
coder  uses  9/7-tap  FIR  filters  that  employ  wavelet  decomposition  [58].  The  image  edges  have  been 
reflected  exploiting  the  symmetric  structure  of  the  filter  coefficients  to  reduce  the  distortion.  The 
DPCM  coder  used  in  this  study  has  a  three-bit  uniform  quantizer  and  a  third  order  predictor 
that  predicts  the  value  of  a  subband  coefficient  using  three  neighboring  coefficients.  The  predictor 
coefficients  are  calculated  using  the  autocorrelation  function  of  the  image  to  minimize  the  mean 
square  error.  This  process  is  a  modification  of  the  well-known  Wiener-Hopf  equations.  The  DPCM 
coder  is  used  to  code  only  the  LL  subimage  since  the  rate  allocation  schemes  used  in  the  study 
allocates  all  three  bits  to  the  low-low  subband.  The  codebook  of  the  Huffman  coder  is  formed  using 
a  training  image.  Therefore,  it  is  assumed  to  be  known  at  both  the  encoder  and  the  decoder. 


Figure  32:  Test  image  used  for  simulation  results. 

At  the  output  of  the  channel,  a  list  Viterbi  decoder  with  branch  metric  that  accounts  for  both  the 
residual  structure  in  the  source  encoder  output  as  well  as  the  channel  probability  of  error  is  used. 
More  precisely,  if  we  denote  the  output  of  the  source  encoder  by  sequence  {y*}  and  the  channel 
output  as  {&},  the  branch  metric,  L  is  computed  as 

L  =  log  P(yi\yi-i)  +  log  P{yi\yi).  (3) 

The  first  term  in  (3)  is  the  transition  probability  from  symbol  i  to  y*.  If  the  source  encoder  was 
a  perfect  coder,  then  it  would  produce  uncorrelated  symbols.  However,  since  it  is  not  ideal,  there 
is  still  some  structure  in  the  output  sequence.  This  is  often  expressed  as  residual  redundancy.  This 
redundancy  helps  the  decoder  correct  channel  errors.  The  second  term  accounts  for  the  channel 
effect  and  is  subject  to  a  practical  scaling  factor.  As  the  channel  probability  of  error  gets  smaller, 
the  second  term  dominates  the  first  term  which  makes  sense  since  in  small  error  probabilities,  it  is 
less  likely  that  a  bit  will  be  in  error.  In  a  noisier  channel,  however,  the  first  term  dominates  the 
second  term  which  basically  means  that  the  bits  are  more  likely  to  be  in  error.  Therefore,  the  first 
term  that  is  a  measure  of  the  residual  redundancy  at  the  source  encoder  output  should  prevail  the 
second  term.  The  source  statistics  that  make  up  the  first  term  in  the  branch  metric  is  calculated 
using  a  training  image  that  has  similar  statistical  characteristics  with  the  image  transmitted.  The 
second  metric  is  calculated  assuming  that  the  channel  is  a  BSC  with  known  transition  probability. 
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The  list  Viterbi  decoder  generates  a  list  of  B  globally  best  candidate  sequences  after  a  trellis 
search.  Through  the  trellis,  for  each  node,  B  branches  with  minimum  costs  survive  out  of  the 
N  x  B  candidates,  where  N  is  the  number  of  states  in  the  trellis.  Depending  on  the  length  of  the 
list  (B)  and  the  number  of  states  ( N ),  the  algorithm  can  be  substantially  complex  in  terms  of  the 
number  of  computations  required.  It  should  also  be  noted  that,  since  the  source  encoder  outputs 
variable  length  symbols  (due  to  the  Huffman  encoder),  the  list  Viterbi  decoder  incorporates  them 
in  the  trellis  in  the  form  of  variable  length  states.  Therefore,  different  paths  entering  a  state  use 
up  a  different  number  of  bits  from  the  received  sequence. 

The  image  to  be  transmitted  is  encoded  on  a  row-by-row  basis,  and  the  encoded  rows  are  packetized 
and  transmitted  over  the  BSC.  The  packets  also  carry  the  number  of  bits  that  they  convey.  This 
information  is  carried  in  the  header  and  is  assumed  to  be  error-free.  Among  the  paths  in  the  list, 
the  path  with  the  smallest  cost  and  correct  number  of  bits  is  chosen  and  decoded  as  the  received 
sequence.  If  there  is  no  path  in  the  list  with  correct  number  of  bits,  the  one  with  the  smallest  cost 
is  chosen. 

This  system  has  been  simulated  for  a  BSC  with  transition  probabilities  ranging  from  p  —  10  “5  to 
p  =  10-1.  The  performance  is  evaluated  using  the  peak  signal-to-noise  ratio  (PSNR)  as  the  quality 
measure.  The  PSNR  is  defined  as 

PSNR=101og10^L  (4) 

where  u,-  is  the  actual  pixel  value,  Ui  is  the  reconstructed  pixel  value,  and  up  is  the  maximum  pixel 
value,  which  is  255. 

The  simulation  results  have  been  shown  in  Figure  33  (green  curve).  The  PSNR  value  of  the 
reconstructed  image  starts  at  34.879  dB  and  decreases  gracefully  as  the  channel  gets  noisier.  The 
rate  of  the  overall  system  is  0.463  bpp.  It  should  be  noted  that  the  3-bit  DPCM  coder  encodes  only 
the  LL  subimage  which  gives  a  rate  of  0.75  bpp.  The  Huffman  encoder  further  reduces  this  rate  to 
0.463  bpp.  Considering  the  fact  that  the  input  to  the  Huffman  encoder  is  already  a  source  coded 
data,  a  lossless  compression  of  ratio  0.75/0.463  =  1.62  is  substantial.  The  reason  for  this  relatively 
high  compression  ratio  is  due  to  the  uniform  structure  of  the  quantizer  used  in  the  DPCM  coder. 

Next,  to  make  the  performance  of  the  proposed  system  more  robust  to  channel  errors,  some  amount 
of  redundancy  is  added  to  the  source  coder.  A  nonbinary  convolutional  encoder  (NCE)  [44]  is  used 
for  this  purpose.  The  structure  of  the  rate  1/2  NCE  is  shown  in  Figure  34. 

The  input  to  the  NCE,  xn  is  the  output  of  the  3-bit  DPCM  encoder,  with  is  selected  from  the 
alphabet  {0, 1, 2, . . . ,  N  -  1},  where  N  =  23  =  8.  As  the  output  of  the  rate  1/2  NCE  is  related 
to  the  input  by  the  relation  yn  =  1V®„_ i  +  xn,  the  output  alphabet  becomes  {0, 1, 2, . . . ,  M  -  1}, 
where  M  =  N2  =  64  as  follows  from  the  input-output  relation  given  above.  This  NCE  outputs  6 
bits  for  every  3  bits  input  to  it.  Therefore,  it  is  a  rate  1/2  coder.  Given  any  6-bit  output  symbol 
at  time  n  —  1,  the  NCE  outputs  a  limited  number  of  symbols  from  its  output  alphabet  at  time  n. 
Specifically,  given  a  value  for  yn-i,  Vn  can  take  on  a  value  from  {otN,  aN+l,oiN+2, . . .  ,aN+N— 1}. 
For  example,  if  at  time  n  -  1,  the  output  is  19  (010011),  the  output  of  the  NCE  at  time  n  can 
only  be  one  of  the  following  symbols:  {24,25,26,27,28,29,30,31}.  Notice  that  while  the  encoder 
output  alphabet  size  is  of  size  N 2,  at  any  given  instant  the  encoder  can  only  emit  one  of  N  different 
symbols.  This  property  of  the  rate  1/2  NCE  gives  the  channel  decoder  an  improved  ability  to  detect 
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Performance  of  the  Proposed  System  with  and  without  NCE 


Figure  33:  Performance  comparison  of  a  concatenated  joint  source-channel  coding  schemes  with 
and  without  a  nonbinary  convolutional  code. 


and  correct  channel  errors  at  the  expense  of  increased  rate.  It  should  also  be  noted  that  the  NCE 
is  designed  to  ensure  that  its  input  alphabet  matches  the  output  alphabet  of  the  DPCM  encoder. 
This  helps  the  residual  structure  in  the  DPCM  output  to  be  maintained  for  channel  coding. 

The  results  for  the  proposed  system  with  the  rate  1/2  NCE  is  shown  in  Figure  33  (red  curve).  It 
is  clear  that  the  PSNR  values  remain  almost  constant  until  p  =  10  2.  The  use  of  the  NCE  makes 
the  system  more  robust  to  channel  errors  at  the  cost  of  increased  rate  as  compared  to  the  system 
without  the  NCE.  It  should  be  noted,  however,  that  using  the  NCE  increases  the  number  of  states 
in  the  decoding  trellis  substantially  (depending  on  the  rate)  which  yields  a  more  complex  system 
in  terms  of  the  number  of  computations  required.  The  rate  of  the  overall  system  is  0.818  bpp. 
The  rate  of  the  system  without  the  NCE  was  0.463  bpp.  It  might  be  expected  that  using  a  rate 
1/2  NCE  would  double  the  rate,  i.e.  increase  to  2  x  0.463  =  0.926.  However,  since  the  Huffman 
encoder  is  at  the  output  of  the  NCE,  it  helps  to  reduce  this  rate  to  0.818  bpp,  that  corresponds  to 
a  compression  ratio  of  1.13. 


5.4.1  Comparison  of  the  Joint  System  to  a  Separated  System 

The  performance  of  the  previous  joint  source  and  channel  coding  system  using  the  rate  1/2  NCE 
was  compared  to  the  traditional  separated  system  shown  in  Figure  35.  The  convolutional  code  used 
was  the  maximum  free  distance  rate  1/2  memory  6  code.  The  average  rate  of  the  separated  system 
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was  0.95  bpp.  The  simulated  channel  was  the  additive  white  Gaussian  noise  (AWGN)  channel  using 
both  hard  and  soft  decisions  at  the  decoders.  The  channel  is  assumed  to  be  power  constrained  so  in 
order  to  compare  the  systems  with  different  rates  the  performances  are  plotted  versus  Ep/N0  where 
Ep  is  the  energy  per  pixel.  The  energy  of  each  transmitted  bit,  Eb,  is  determined  by  the  rate  of  the 
system,  R  (bpp)  according  to  Eb  =  EP/R.  Figure  36  shows  the  simulation  results.  The  simulated 
performances  show  that  the  separated  system  outperforms  the  joint  system  except  at  small  SNR. 


5.5  Progressive  Image  Transmission  over  Noisy  Channels 

In  this  section,  we  will  present  one  of  the  novel  source  coding  schemes  in  the  literature,  and 
investigate  its  performance  over  noisy  channels. 

Set  Partitioning  In  Hierarchical  Trees  (SPIHT)  [57]  is  one  of  today’s  most  successful  and  practical 
image  coders  for  the  noiseless  channel.  It  has  been  shown  to  outperform  almost  any  other  existing 
source  coding  scheme.  It  is  computationally  simple  and  has  a  progressive  mode  of  transmission, 
which  means  that  as  more  bits  are  transmitted,  better  quality  reconstructed  images  can  be  produced 
at  the  receiver.  The  receiver  need  not  wait  for  all  bits  to  arrive  before  decoding,  it  refines  the 
decoded  image  with  the  arrival  of  each  bit  of  information. 

SPIHT  is  a  wavelet-based  coding  technique,  it  uses  the  9/7-tap  FIR  filters  that  were  discussed 
in  the  previous  section.  However,  the  depth  of  decomposition  is  different,  it  decomposes  the  low- 
low  subimages  using  the  fact  that  most  of  an  image’s  energy  is  concentrated  in  the  low  frequency 
components.  We  can  view  the  output  of  a  subband  coder  as  in  Figure  37. 

Since  the  energy  is  concentrated  in  the  low  frequency  components,  we  can  keep  decomposing  the  low- 
low  subimages  to  perform  finer  coding  on  those  components.  In  Figure  38(a),  the  low-low  subimage 
has  further  been  decomposed  to  obtain  2-level  decomposition.  In  this  case,  the  decomposition 
results  in  7  subbands.  For  an  image  of  dimension  N  x  N,  the  four  subbands  of  the  low-low 
subimage,  namely  the  LLLL,  LLLH,  LLHL,  and  LLHH  subbands  will  have  dimensions  °f  x  *  T 
since  subsampling  is  employed  after  each  filtering  operation.  The  remaining  low-high,  high-low 
and  high-high  subbands  will  have  dimensions  of  f  x  In  Figure  38(b),  the  image  has  been 
decomposed  five  times,  to  result  in  16  subbands.  The  four  subbands  in  the  highest  level  of  the 


Figure  34:  Rate  1/2  Nonbinary  convolutional  encoder. 


Figure  35:  Block  diagram  of  a  separated  system. 


Figure  36:  Performance  comparison  of  joint  and  separated  systems. 


pyramid  have  dimensions  of  §  x  §.  This  form  of  image  decomposition  is  often  referred  to  as 
hierarchical  subband  transformation  or  pyramid  transformation  and  the  lower  bands  correspond  to 
the  higher  levels  of  the  pyramid. 

To  understand  the  motivation  under  progressive  image  transmission,  let’s  define  the  following  nota¬ 
tion:  Let  pij  be  the  pixel  value  of  the  image  at  coordinate  (i,j).  Let  fl(-)  be  the  unitary  hierarchical 
transformation  defined  as  c  =  fl(p),  where  the  two-dimensional  array  c  has  the  same  dimensions 
of  p.  Cij  is  the  transform  coefficient  at  coordinate  (i,j).  In  a  progressive  transmission  scheme, 
the  decoder  initially  sets  the  reconstruction  vector  c  to  zero  and  updates  its  components  according 
to  the  coded  message.  After  receiving  the  value  of  some  coefficients,  the  decoder  can  obtain  a 
reconstructed  image  using  the  inverse  transform:  p  —  fi  *(c)  . 

Progressive  transmission  can  be  viewed  as  a  way  of  transmitting  the  most  important  information 
(which  yields  the  largest  distortion  reduction)  first.  If  we  use  the  mean  squared-error  (MSE)  as  the 
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Figure  37:  1-level  Decomposition  used  in  SPIHT. 
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Figure  38:  Two  level  and  five  level  decompositions  used  in  SPIHT. 


distortion  measure, 

*  j 

we  can  use  the  property  that  the  Euclidean  norm  is  invariant  to  the  unitary  transformation.  There¬ 
fore, 

DmSe(p  —  P)  —  Dmse(c  —  c)  =  ^  ~  ^ 

*  3 

If  the  exact  value  of  the  transform  coefficient  Cij  is  sent  to  the  decoder,  then  the  MSE  decreases 
by  \atj\2/N.  This  means  that  the  coefficients  with  larger  magnitude  should  be  transmitted  first 
because  they  have  a  larger  content  of  information.  It  follows  that,  the  value  of  |cjj|  can  be  ranked 
according  to  its  binary  representation,  and  the  most  significant  bits  are  transmitted  first.  In  this 
case,  the  ordering  information  should  also  be  transmitted  which  will  increase  the  rate.  However, 
it  is  shown  [57]  that  this  method  of  transmitting  the  information  is  very  efficient  despite  the  fact 
that  a  large  fraction  of  the  bit-budget  is  spent  in  the  transmission  of  ordering  information. 

The  SPIHT  algorithm  removes  the  need  for  the  ordering  information  by  implicitly  transmitting 
it.  It  is  based  on  the  fact  that  the  execution  path  of  any  algorithm  is  defined  by  the  results  of 
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the  comparisons  on  its  branching  points.  Therefore,  if  the  encoder  and  the  decoder  have  the  same 
sorting  algorithm,  then  the  decoder  can  duplicate  the  encoder’s  execution  path  if  it  receives  the 
magnitude  comparisons,  and  the  ordering  information  can  be  recovered  from  the  execution  path. 

To  reduce  the  number  of  magnitude  comparisons,  a  set  partitioning  rule  is  defined  that  uses  an 
expected  ordering  in  the  hierarchy  defined  by  the  subband  pyramid.  The  objective  is  to  create  new 
partitions  such  that  subsets  expected  to  be  insignificant  contain  a  large  number  of  elements,  and 
subsets  expected  to  be  significant  contain  only  one  element. 

For  this  purpose  the  following  function  is  defined: 

““Wlerlhjl}  ^  2”  (7) 

^  1  0,  otherwise 

to  indicate  the  significance  of  a  set  of  coordinates  T.  The  significance  of  a  single  pixel  value  is 
denoted  by  Sn(i,j). 

The  hierarchical  relationship  between  the  coefficients  in  the  subband  pyramid  is  defined  by  a  tree 
structure,  called  the  spatial  orientation  tree.  The  tree  is  formed  in  such  a  way  that  each  node  has 
either  no  offspring  or  four  offsprings,  which  always  form  a  group  of  2  x  2  adjacent  coefficients.  The 
coefficients  in  the  highest  level  of  the  pyramid  are  the  tree  roots  and  are  also  grouped  in  2  x  2 
adjacent  coefficients.  However,  their  offspring  branching  rule  is  different,  and  in  each  group  one  of 
them  has  no  descendants.  For  a  2-level  decomposition  (or  2-level  pyramid),  the  structure  of  the 
spatial  orientation  tree  is  depicted  in  Figure  39. 


Figure  39:  Spatial  orientation  tree  used  in  SPIHT. 
The  following  sets  of  coordinates  are  used  to  present  the  new  coding  method: 


•  0(i,j ):  set  of  coordinates  of  all  offsprings  of  node  (i,  j); 


•  V(i,j ):  set  of  coordinates  of  all  descendants  of  the  node  (i,j)', 

•  H:  set  of  coordinates  of  all  spatial  orientation  tree  roots  (nodes  in  the  highest  pyramid  level); 

•  £(*,j)  =  Vihj)  - 

In  order  to  implicitly  transmit  the  ordering  information,  a  set  partitioning  rule  is  defined  according 
to  the  following  principles: 

1.  the  initial  partition  is  formed  with  the  sets  {(i,j)}  and  V(i,j)  for  all  (i,j)  6  ft] 

2.  if  V(i,j)  is  significant  then  it  is  partitioned  into  £(i,j)  plus  the  four  single-element  sets  with 
(kj)  €  0(i,j). 

3.  if  £(i,j)  is  significant  then  it  is  partitioned  into  the  four  sets  T>(k,l),  with  (k,l)  G  0(i,j). 

The  SPIHT  algorithm  works  according  to  the  significance  of  the  sets  that  it  partitions  as  well  as 
the  significance  of  single  coordinates.  For  this  purpose,  three  lists  are  used:  list  of  insignificant 
sets  (LIS),  list  of  insignificant  pixels  (LIP),  and  list  of  significant  pixels  (LSP).  LIP  and  LSP  store 
individual  pixel  coordinates  (i,j),  on  the  other  hand  the  LIS  stores  the  sets  T)(i,j)  or  £(i,j )• 

The  SPIHT  algorithm,  using  the  concepts  given  above,  is  as  follows: 

1.  Initialization:  output  n  =  L1 2°g2(max(iu){lc‘jl})J;  861  the  LSP  88  811  emPty  list’  and  8(1(1  the 
coordinates  (i,j)  eH  to  the  LIP,  and  only  those  with  descendants  also  to  the  LIS,  as  type 

V  entries. 

2.  Sorting  pass: 

(a)  for  each  entry  (i,j)  in  the  LIP  do: 

i.  output  Sn(i,j ) 

ii.  if  Sn(i,j)  =  1  then  move  (i,  j)  to  the  LSP  and  output  the  sign  of  c*j; 

(b)  for  each  entry  (i,j)  in  the  LIS  do: 

i.  if  the  entry  is  of  type  V  then 

•  output  Sn(T>(i,j)y, 

•  if  Sn(V(i,j))  =  1  then 

-  for  each  ( k,l )  6  0(i,j)  do: 

*  output  Sn(k,l); 

*  if  5„(M)  =  1  then  add  (k,l)  to  the  LSP  and  output  the  sign  of  ck<i\ 

*  if  Sn(k,l)  =  0  then  add  (k,l)  to  the  end  of  the  LIP; 

ii.  if  C(i,j)  /  0  then  move  {i,j)  to  the  end  of  the  LIS,  as  an  entry  of  type  £;  else, 
remove  entry  (t,  j)  from  the  LIS; 

(c)  if  the  entry  is  of  type  C  then 

•  output  Sn(£(i,j)); 
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•  if  Sn(£(i,j))  =  1  then 

-  add  each  (fc,  l)  e  0(i,j)  to  the  end  of  the  LIS  as  an  entry  of  type  V ; 

—  remove  (i,j)  from  the  LIS 

3.  Refinement  pass:  for  each  entry  (i,  j)  in  the  LSP,  except  those  included  in  the  last  sorting 
pass  (i.e.,  with  same  n),  output  the  n-th  most  significant  bit  of  jci,j  [  • 

4.  Quantization-step  update:  decrement  n  by  1  and  go  to  Step  2. 


The  decoding  operation  is  almost  exactly  the  same  as  the  encoding  operation.  The  decoder  dupli¬ 
cates  the  encoder’s  execution  path  as  it  sorts  the  significant  coefficients.  The  three  lists  (LIS,  LIP 
and  LSP)  are  the  same  at  both  the  encoder  and  the  decoder  at  the  same  pass,  which  means  that 
the  decoder  implicitly  recovers  the  ordering  from  the  execution  path,  without  the  need  for  explicit 
ordering  information.  For  the  value  of  ti  when  a  coordinate  is  moved  to  the  LSP,  it  is  known  that 
2"  <  |cf  j|  <  2n+1.  So  the  decoder  uses  that  information,  plus  the  sign  bit  that  is  input  just  after 
the~insertion  in  the  LSP,  to  set  Cij  =  ±1.5  x  2n.  Also,  during  the  refinement  pass,  the  decoder 
adds  or  subtracts  2"-1  to  cy  when  it  inputs  the  bits  of  the  binary  representation  of  \citj\.  In  this 
manner,  the  distortion  gradually  decreases  during  both  the  sorting  and  refinement  passes  with  the 
receipt  of  each  bit,  which  ensures  a  perfect  progressive  mode  of  transmission. 

The  SPIHT  algorithm  has  been  used  for  source  coding  of  our  test  image.  For  a  noiseless  channel, 
the  rate  vs.  distortion  performance  shown  if  Figure  40  has  been  obtained: 

SPIHT  *  Rat*- Distortion  curve  lor  the  test  image 


Figure  40:  SPIHT’s  rate  versus  PSNR  performance  for  the  test  image. 

It  is  evident  that  the  rate  vs.  distortion  performance  of  SPIHT  is  substantially  high.  It  gives 
relatively  high  PSNR  values  even  for  low  rates.  For  0.5  bpp,  for  example,  a  PSNR  value  of  40.30 
dB  is  obtained  which  provides  a  sufficiently  high  image  quality.  To  illustrate  the  high  performance 
of  SPIHT,  it  is  compared  to  vector  quantization  (VQ).  The  LBG  algorithm  [49]  is  used  for  vector 
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quantizer  design,  and  the  splitting  algorithm  is  used  for  initialization.  Table  9  depicts  the  com¬ 
parison  results.  The  training  image  used  to  form  the  codebook  is  statistically  similar  to  the  test 
image. 


Table  9:  Comparison  of  SPIHT  versus  VQ  as  a  function  of  rate 


Rate  (bpp) 

PSNR  (dB) 

VQ 

SPIHT 

0.125 

24.71 

32.71 

0.25 

28.26 

36.32 

0.5 

32.18 

40.30 

Prom  Table  9,  it  is  evident  that  SPIHT  outperforms  VQ  by  about  8  dB  which  is  a  substantially  high 
difference.  For  0.25  and  0.125  bpp,  the  reconstructed  images  using  VQ  and  SPIHT  are  respectively 
shown  in  Figure  41.  The  images  in  the  first  column  are  coded  using  VQ,  and  the  images  in  the 
second  column  are  coded  using  SPIHT.  The  first  row  of  images  have  rate  of  0.25  bpp,  where  the 
images  in  the  second  row  have  rate  of  0.125  bpp. 


Figure  41:  Reconstructed  images  using  VQ  (first  column)  and  SPIHT  (second  column). 
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While  SPIHT  has  very  high  performance  in  noiseless  channels,  it  is  very  sensitive  to  channel  errors 
in  noisy  channels.  The  channel  errors  can  cause  unrecoverable  loss  of  synchronization  between 
the  encoder  and  the  decoder.  Total  collapse  of  the  reconstructed  image  often  results  from  loss  of 
synchronization.  In  fact,  the  vast  majority  of  images  transmitted  using  this  progressive  wavelet- 
based  algorithm  will  frequently  collapse  even  if  a  single  transmitted  information  bit  is  incorrectly 
decoded  at  the  receiver. 

SPIHT’s  high  source  coding  performance  has  led  many  researchers  to  find  efficient  channel  coding 
schemes  to  reduce  its  excessive  vulnerability  to  channel  errors.  The  work  of  Sherwood  and  Zeger 
[59]  is  one  of  the  most  successful  studies  to  make  SPIHT  robust  to  noisy  channels.  They  partition 
the  output  bit  stream  of  the  SPIHT  coder  into  consecutive  blocks  of  length  N,  where  they  take 
N  =  200.  Next,  they  add  c  (c  =  16)  checksum  bits  based  on  the  N  bits  along  with  zero  bits  to 
flush  the  memory  units  of  the  convolutional  encoder.  The  convolutional  coder  used  in  this  study 
is  a  rate-compatible  punctured  convolutional  (RCPC)  coder  [60].  At  the  receiver,  a  list  Viterbi 
decoder  with  100  paths  is  used  to  choose  the  path  with  the  lowest  path  metric  that  also  satisfies 
the  checksum  equations. 

In  our  study,  we  also  use  the  RCPC  encoder  to  channel  encode  the  SPIHT’s  output  bit  stream.  The 
RCPC  coder  is  implemented  using  a  (3, 1, 2)  convolutional  encoder  with  transfer  function  matrix 
G(D)  =  [1  +  D,  1  +  D2, 1 +  D  +  D2}.  The  critical  issue  in  this  joint  source  and  channel  coder  design 
is  the  rate  allocation  problem.  For  a  given  channel  probability  of  error  and  a  given  overall  rate, 
the  optimum  source  rate  and  channel  rate  should  be  determined  for  maximum  PSNR  performance. 
Once  it  is  determined,  a  convenient  puncturing  rule  is  employed.  The  optimum  rate  allocation  for 
the  system  has  been  found  using  simulations.  To  make  a  performance  comparison  with  the  scheme 
explained  in  the  precious  section,  we  have  fixed  the  overall  rate  to  0.463  bpp  which  is  the  same  as 
the  overall  rate  in  the  previous  scheme.  And  we  distributed  this  rate  between  the  source  coder  and 
channel  coder. 

Simulations  have  shown  that 

•  for  p  <  10-3,  a  rate  4/6  RCPC  encoder  with  puncturing  rule  ai 

•  for  p  =  2  x  10-3,  a  rate  4/7  RCPC  encoder  with  puncturing  rule  a<i 

•  for  p  =  5  x  10~3,  a  rate  4/9  RCPC  encoder  with  puncturing  rule  03 

•  for  p  =  10~2,  a  rate  4/10  RCPC  encoder  with  puncturing  rule  04 

•  for  p  >  10~2,  a  rate  4/12  (which  is  the  maximum  rate  for  this  coder)  RCPC  encoder  with 
puncturing  rule  as 


are  suitable  choices.  For  these  rates,  the  corresponding  source  coding  rates  and  the  puncturing 
rules  have  been  shown  in  Table  10  and  the  puncturing  rule  matrices  are  below. 
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0 
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1 

0 
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\  1 

0 

1 
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Table  10:  Rate  allocation  for  SI 

?IHT  +  RCPC  System. 

Rate  Allocation 
Scheme 

Source  Coding  Rate 

Channel  Coding  Rate 
and  Puncturing  Rule 

Overall  Rate 

Scheme  I 

Ra  =  0.463/1.5  =  0.309  bpp 

4/6  PCC,  a\ 

Rt  —  0.463  bpp 

Scheme  II 

Rt  =  0.463  bpp 

Scheme  III 

IK3BE!i£MIS&l!iili33 

Rt  =  0.463  bpp 

Scheme  IV 

Rs  =  0.463/2.5  =  0.185  bpp 

Scheme  V 

Ra  =  0.463/3  =  0.154  bpp 

Rt  =  0.463  bpp  | 

Cl2  = 


1  o  1  1  \ 
0  10  1 
1  0  1  oj 


03  = 


1  0  1  l\ 
1110 
0  111/ 


1  1  0  1  \ 
1111 
0  111/ 


05  = 


1  1  1  1  \ 

1111 

1111/ 


The  performance  of  the  proposed  system  has  been  shown  in  Figure  42.  We  used  the  same  two 
rates  that  were  used  in  the  previous  section,  namely  R=0.463  bpp  and  R=0.818  bpp.  In  Figure 
43  and  44,  the  performance  of  this  system  and  the  system  proposed  in  the'  previous  section  are 
compared  for  the  same  rate  pairs.  Note  that  using  SPIHT  for  the  source  coder  and  RCPC  codes  as 
the  channel  coder,  we  can  achieve  any  desired  rate.  In  other  words,  exact  rate  control  is  possible, 
which  is  not  true  for  the  previous  system.  It  is  clear  that  this  new  system  has  superior  performance 
as  compared  to  the  previous  joint  source  channel  coding  scheme.  It  owes  its  higher  performance  to 
the  efficient  source  coding  scheme,  namely  SPIHT.  It  should  be  noted,  on  the  other  hand,  that  an 
efficient  data  compression  system  removes,  to  the  extent  possible,  the  redundancy  in  the  source  and 
retains  the  useful  (nonredundant)  part  in  an  effort  to  reduce  the  rate.  This  removal  of  redundancy, 
in  turn,  can  introduce  a  great  deal  of  sensitivity  to  the  channel  errors.  The  channel  errors,  if  not 
dealt  with  properly,  can  result  in  significant  degradations  in  the  performance  of  the  compression 
scheme.  It  has  been  observed  that  SPIHT,  as  a  very  efficient  source  coding  scheme,  is  also  very 
sensitive  to  channel  errors.  It  is  so  vulnerable  that  total  collapse  of  the  reconstructed  image  is 
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possible  even  if  a  single  transmitted  bit  is  incorrectly  decoded  at  the  receiver.  The  channel  coder 
on  the  other  hand  is  very  simple  as  compared  to  previous  method’s  variable  length  list  Viterbi 
decoder  with  MAP  metric.  The  channel  coder  can  be  further  improved  as  in  Sherwood  and  Zeger’s 
[59]  work  for  better  error  correcting  performance. 


Performance  Comparison 


Figure  42:  Performance  comparison  of  SPIHT  with  punctured  convolutional  coding  at  different 
rates. 


Figure  43:  Performance  comparison  of  SPIHT  with  punctured  convolutional  coding  to  a  concate¬ 
nated  subband  joint-source  coding  channel  scheme. 
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Figure  44:  Performance  comparison  of  SPIHT  with  punctured  convolutional  coding  to  a  concate¬ 
nated  subband  joint-source  channel  coding  scheme  with  a  nonbinary  convolutional  encoder.. 

5.6  Channel-Optimized  Vector  Quantization 

Vector  quantization  (VQ)  [49],  as  a  means  of  data  compression,  has  received  a  tremendous  amount 
of  attention  in  the  past  decade.  Utilizing  vector  quantization  instead  of  scalar  quantization  has 
resulted  in  dramatic  performance  improvements  (in  terms  of  rate  reduction  for  a  given  level  of 
distortion,  or  reduction  of  distortion  for  a  given  rate)  in  various  image  coding  applications.  The 
superiority  of  VQ  is  due  to  Shannon’s  [21]  source  coding  theorem  which  basically  says  that  encoding 
sequences  of  outputs  can  provide  an  advantage  over  the  encoding  of  individual  samples.  This  result, 
proved  by  taking  longer  and  longer  sequences  of  inputs,  indicates  that  a  quantization  strategy  that 
works  with  sequences  or  blocks  of  output  would  provide  some  improvement  in  performance  over 
scalar  quantization.  This  fact  is  the  main  motivation  behind  vector  quantization. 

In  vector  quantization,  the  source  output  is  grouped  into  blocks  or  vectors.  In  image  coding,  for 
example,  a  vector  is  composed  of  fc  pixels.  These  pixels  are  chosen  to  be  neighboring  pixels  in  the 
two-dimensional  image  plane.  This  vector  of  source  outputs  forms  the  input  to  the  vector  quantizer. 
At  both  the  VQ  encoder  and  decoder,  there  is  a  set  of  fc-dimensional  vectors  which  is  called  the 
codebook  of  the  vector  quantizer.  The  codebooks  at  the  encoder  and  the  decoder  are  the  same, 
and  contain  M  fc-dimensional  vectors,  called  the  code-vectors. 

The  operation  of  a  vector  quantizer  is  briefly  as  follows:  The  encoder  forms  the  fc-dimensional 
vectors,  and  compares  each  vector  to  each  of  the  M  code-vectors  in  the  codebook.  For  a  given 
input  vector,  it  finds  the  code-vector  (amongst  the  M  possible  code-vectors  in  the  codebook)  that 
is  closest  to  the  input  vector.  Next,  it  transmits  the  index  of  the  code-vector  it  matches  with  the 
input  vector.  For  a  codebook  of  size  M,  the  index  is  a  log2  Af-bit  number.  The  encoder  and  the 
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decoder  uses  these  log2  M  bits  to  identify  a  vector  of  k  pixels.  In  other  words,  the  number  of  bits 
per  pixel  for  VQ  with  codebook  size  of  M  and  vector  size  of  k  is  log£  — .  This  is  the  rate  of  this 
vector  quantizer.  Finally,  the  decoder  receives  the  index  values.  Since  it  has  the  same  codebook 
as  the  encoder,  it  matches  the  index  values  with  the  corresponding  code-vectors,  and  reconstructs 
the  image  using  them. 

One  of  the  main  advantages  of  vector  quantization  is  its  simplicity.  The  encoder  simply  makes 
comparisons  with  the  input  vector  and  M  code-vectors  in  the  codebook,  and  transmits  the  index 
of  the  closest  code-vector.  The  operation  of  the  decoder  is  indeed  simpler:  it  receives  the  index  and 
finds  the  corresponding  code- vector  using  a  lookup  table. 

The  main  challenge  in  the  implementation  of  vector  quantizers  is  the  codebook  design.  The  de¬ 
termination  of  the  code-vectors  that  would  yield  the  minimum  distortion  for  a  given  rate  is  a 
nontrivial  problem.  Various  approaches  to  the  design  of  codebooks  for  vector  quantizers  have  been 
proposed  in  the  literature.  An  efficient,  as  well  as  practical  method  of  codebook  design  for  VQ  is  the 
Linde-Buzo-Gray  (LBG)  algorithm  [49]  that  has  been  widely  used  in  VQ  applications.  The  LBG 
algorithm  is  an  iterative  method  of  designing  VQ  codebooks,  where  at  each  iteration  the  distortion 
is  reduced  gradually.  The  algorithm  operates  on  a  training  set  of  vectors  {Xn}^=1  ,  where  the 
training  set  should  have  similar  characteristics  to  the  source  to  be  coded.  The  codebook  is  formed 
using  the  training  set.  Therefore,  it  is  important  that  the  source  and  the  training  set  should  be 
statistically  alike  for  better  coding  performance.  The  LBG  algorithm  recursively  finds  and  updates 
quantization  regions,  and  the  code-vectors  (called  reconstruction  vectors),  denoted  by 

&t  the  A;-th  step.  The  overall  distortion  at  the  fc-th  step,  is  evaluated  between 
the  training  vectors  and  the  corresponding  reconstruction  vectors.  The  recursion  and  update  of 
the  quantization  regions  and  the  reconstruction  vectors  is  continued  until  the  average  distortion  is 
reduced  below  the  threshold,  e.  The  algorithm,  using  the  notation  above,  is  summarized  as  follows: 

1.  Set  k  =  0.  Start  with  an  initial  set  of  reconstruction  vectors  and  a  set  of  training 

vectors  {X„}£Lj.  Set  =  0.  Choose  a  threshold  e. 

2.  Update  the  quantization  regions 

=  {Xn  :d(Xn,Yi)  <  d(Xn,Yj)  VjVi}  i  =  l,2,...,M.  (8) 

3.  Compute  the  average  distortion  between  the  training  vectors  and  the  corresponding 
reconstruction  vectors. 

4.  If  £>(fc)~^)(fc~—  <  «  stop;  otherwise,  continue. 

5.  k  =  k  +  1.  Update  the  reconstruction  vectors  {F/^}^  that  are  the  average  values  of  the 
elements  of  each  of  the  quantization  regions  Vf  1.  Go  to  Step  2. 

There  are  some  problems  in  the  implementation  of  the  LBG  algorithm.  One  arises  in  the  first  step 
of  the  algorithm.  The  performance  of  the  LBG  algorithm  is  closely  related  with  the  initial  choice  of 
the  reconstruction  vectors.  Different  initial  conditions  selected  for  the  reconstruction  vectors  yield 
different  distortion  values  at  each  step.  Therefore,  the  problem  of  initialization  of  reconstruction 
vectors  has  a  major  impact  on  the  performance  of  VQ.  One  of  the  most  commonly  used  ways  of 


initializing  the  LBG  algorithm  is  the  splitting  technique  [49].  In  this  technique,  the  LBG  algorithm 
starts  with  a  single  reconstruction  vector  (which  is  the  average  value  of  the  all  training  vectors). 
And  at  each  step,  the  number  of  reconstruction  vectors  is  doubled  by  adding  to  each  of  them  a 
perturbation  vector.  With  the  doubling  of  reconstruction  vectors,  the  quantization  regions  and 
reconstruction  vectors  are  updated  using  the  LBG  algorithm.  The  splitting  technique  is  a  practical 
and  efficient  way  of  initializing  the  LBG  algorithm. 

Another  problem  in  the  implementation  of  the  LBG  algorithm  comes  into  play  in  the  second  step 
of  the  algorithm.  It  is  possible  that,  after  updating  the  quantization  regions,  one  (or  more)  of  the 
new  quantization  regions  may  be  empty.  One  way  of  getting  around  this  problem  is  to  remove  the 
reconstruction  vector  with  no  training  vector  associated  with  it,  and  to  assign  one  of  the  training 
vectors  that  belongs  to  the  reconstruction  vector  with  the  greatest  number  of  training  vectors  as 
the  new  reconstruction  vector. 

For  an  extensive  discussion  of  these  and  other  problems  in  vector  quantization,  [50]  is  a  compre¬ 
hensive  reference. 

While  vector  quantization  is  an  efficient  source  coding  technique  (as  compared  to  scalar  quanti¬ 
zation),  its  performance  may  be  expected  to  be  poorer  than  that  of  scalar  quantization  in  noisy 
channels.  This  may  be  justified  as  follows:  a  source  coding  technique  becomes  more  efficient  as 
it  reduces  more  redundancy  in  the  source.  As  the  redundancy  is  removed,  the  remaining  nonre- 
dundant  part  of  the  source  (source  encoded  data)  becomes  more  sensitive  to  the  channel  errors. 
SPIHT’s,  as  a  very  efficient  source  coding  technique,  excessive  vulnerability  to  transmission  errors 
is  an  example  of  this  phenomenon.  Similar  discussion  for  the  sensitivity  of  efficient  source  coding 
schemes  to  bit  errors  was  made  in  the  previous  section. 

In  an  effort  to  increase  VQ’s  performance  in  noisy  channels,  a  number  of  techniques  have  been 
proposed  in  the  literature  [51]  -  [56].  In  this  section,  we  implement  the  channel-optimized  vector 
quantizer  proposed  in  [54]. 

The  objective  in  the  design  of  the  channel-optimized  vector  quantizer  is  to  introduce  the  effect  of 
the  channel  in  the  VQ  design  process  in  order  to  maximize  the  performance  of  the  vector  quantizer 
for  noisy  channels.  For  this  purpose,  the  distortion  measure  is  modified  to  account  for  the  channel 
error  probability.  The  design  of  a  channel-optimized  vector  quantizer  is  briefly  explained  below. 

A  fc-dimensional,  M-level  VQ  is  designed  for  a  discrete  memoryless  channel  (DMC)  with  input  and 
output  alphabets  {1,2, . . .  M}.  P(j\i)  is  the  probability  that  index  j  is  received  given  that  index  i 
is  transmitted.  The  fc-dimensional  source  output  vector  is  denoted  as  x  =  (a:nfc>*nfc+i»  •  •  •  »®nfc+fc-i) 
and  has  a  fc-fold  probability  density  function  (p.d.f),  p{x). 

The  encoder  is  described  in  terms  of  a  partition  of  quantization  regions  V  =  {5i,52, . . . 
according  to  the  following  encoder  mapping: 

j(x)  =  t,  if  x  €  Si,  i  €  {1,2, ...M}.  (9) 

The  decoder  is  described  in  terms  of  the  codebook  C  —  {cj,C2, . . . , cm }  according  to  the  following 
decoder  mapping: 

9ti)=Cj,  j  €  {1,2, ...A/}.  (10) 
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Let  the  distortion  caused  by  representing  the  source  vector  x  by  a  code-vector  y  be  denoted  by 
d(x,y).  The  average  distortion,  D,  of  this  system  is 


M  M 


iVi  ivi  ~ 

5353^01*)  /  p(x)d(x,Cj)dx. 
i= l  i= l  JSi 


(11) 


Note  that  the  expression  of  the  average  distortion  given  above  accounts  for  the  distortion  that 
arises  from  the  source  encoder  as  well  as  the  channel. 


The  goal  in  channel-optimized  vector  quantization  (COVQ)  design  is  to  choose  the  codebook  C  and 
quantization  regions  V  to  minimize  the  average  distortion  D.  It  has  been  shown  in  [54]  that  the 
optimal  quantization  regions  as  well  as  the  optimal  codebook  for  the  mean  squared  error  distortion 
measure  is  evaluated  as: 

M  M 

s;  =  {x:  vi},  I  6  {1,2  (12) 

j= 1  3= 1 


*  _  Zjii  P(j\i)  Is,  xp(x)dx 

Cj  E,"i  P(M  h  p(x)dx  ’ 


j  e  {1,2,. ..M}. 


(13) 


The  equations  (12)  and  (13)  are  applied  recursively:  In  (12),  the  quantization  regions  are  evaluated. 
These  quantization  regions  are  then  applied  in  (13)  to  find  the  code-vectors  which  are  again  invoked 
in  (12)  to  find  the  quantization  regions.  It  should  be  noted  that  both  equations  have  the  term  P{j\i) 
that  introduces  the  channel  effect  into  the  design  of  the  COVQ.  In  this  sense,  the  channel-optimized 
vector  quantizer  falls  into  the  category  of  constrained  joint  source  and  channel  coders,  where  the 
source  coders  are  modified  to  account  for  the  presence  of  a  noisy  channel.  In  COVQ,  the  vector 
quantizer  is  optimized  subject  to  a  noisy  channel  constraint. 


The  channel-optimized  vector  quantizer  as  well  as  the  regular  vector  quantizer  have  been  simulated 
for  the  test  image  in  Figure  32.  Different  codebooks  have  been  obtained  for  different  channel  error 
probabilities  using  COVQ.  The  channel  is  a  BSC  and  the  training  image  is  different  from  the  test 
image.  Codebook  size,  M,  is  16,  and  the  vectors  are  of  dimension  2x2.  Thus,  the  rate  is  1  bpp. 
The  results  are  depicted  in  Figure  45. 


One  observation  that  can  be  made  from  the  figure  is  that  the  COVQ  performs  slightly  poorer 
than  VQ  for  very  low  channel  error  probabilities.  As  the  channel  error  rate  increases,  the  COVQ 
performs  better.  For  p  =  10~3,  the  COVQ  and  VQ  have  almost  the  same  PSNR  performance. 
For  higher  error  probabilities,  the  COVQ  surpasses  VQ.  For  p  =  0.02,  for  example,  the  COVQ 
outperforms  the  VQ  by  about  1.8  dB. 

As  the  codebook  size  increases,  the  codebook  design  of  COVQ  becomes  computationally  burden¬ 
some.  Also,  since  the  number  of  code  vectors  increases,  code  vectors  get  closer  to  each  other  in  the 
Euclidean  sense.  Therefore,  the  COVQ  does  not  perform  significantly  better  than  VQ.  These  are 
the  observations  from  the  simulations  for  M  =  256  and  vector  size  of  4  x  4  which  yields  a  rate  of 
0.5  bpp. 

Next,  the  performance  of  the  COVQ  has  been  investigated  for  the  case  where  the  encoder  and  the 
decoder  have  a  wrong  estimate  of  the  channel  error  probability.  This  is  called  channel  mismatch.  Let 
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Figure  45:  Performance  comparison  of  a  vector  quantization  (VQ)  scheme  to  a  channel  optimized 
vector  quantization  scheme  (COVQ). 


p  denote  the  actual  channel  error  probability  and  p  denote  the  estimated  channel  error  probability. 
For  p  =  10-2  and  p  =  10-1,  which  is  an  overestimation  by  a  factor  of  10,  the  PSNR  value  is  obtained 
as  26.501  dB  which  is  a  drop  of  1.51  dB.  This  value  is  even  worse  than  regular  VQ  by  0.488  dB. 
On  the  other  hand  for  the  same  channel  error  probability,  if  p  =  10-3  which  is  an  underestimation 
by  a  factor  of  10,  the  PSNR  is  27.854  dB  which  is  a  drop  of  only  0.157  dB.  Moreover,  this  value  is 
still  higher  than  that  of  VQ  by  0.915  dB.  Therefore,  the  penalty  of  overestimation  of  the  channel 
error  probability  is  significantly  higher  than  that  of  underestimation. 
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